Skip to content

Latest commit

 

History

History
1121 lines (1079 loc) · 77.8 KB

v3.24.md

File metadata and controls

1121 lines (1079 loc) · 77.8 KB

Version 3.24 arrives nearly 4 months after the previous one and contains more than 400 commits that can be structured into several main categories (topics):

Table of Contents

Core: Observability

  • "alerts: clear node-restarted" 16cc3ad25 | * after 10h (currently hardcoded)
  • "gateways to count (GET, PUT, DELETE) errors; skip logging" 327bcc8c1
  • "(new) keep-alive error counter & keep-alive alert" 4e027cff4 | * add keep-alive error counter and alert, respectively: | - "err.kalive.n" | - "keep-alive-errors" | * keep-alive alert clears after 5 minutes (default) of no-change | * separately, bump minor versions: | - aisloader, cli, authn
  • "tls cert (re)loader: even more alerts" 3f3e50263 | * part eight, prev. commit: 55db4cb4638a41
    alert comment
    tls-cert-will-soon-expire warning: less than 3 days remains until X.509 cert expires
    tls-cert-expired red alert (as the name implies)
    tls-cert-invalid ditto
  • "tls cert (re)loader: (valid, invalid, expired) state; more alerts" 55db4cb46 | * part seven, prev. commit: 1fe3c80bec3f | * with refactoring
  • "tls cert (re)loader: raise/clear alerts; follow-up" 1fe3c80be | * part six, prev. commit: 7c338a2494f07edb
  • "state flags as red alerts and cyan warnings" 1a8503f16 | * add helper for CLI (tbd) to use two respective colors | * prev. commit: c434fb65380359c
  • "state flags as red alerts and/or warnings" c434fb653 | * 'show cluster' to show red
  • "node state flags (cont-d)" b3e1adfe7 | * prev. commit: 3fcd0ef3c35c
  • "high number of goroutines: revise/amend; add node-state alert" 7324344cd
  • "observability: add head(object) counter, latency, and error-count" 7a15bb331 | * counting only remote heads for now
  • "observability: multipart-upload: add put/get metrics" 9fbf736d9
  • "observability: Prometheus labels (major)" f5d271bfb | * further split StatsD and Prometheus sources: | - statsValue - with no labels for Prometheus | - runner.reg - simplified --/-- | - units and naming: computed latencies are always reported in milliseconds, | computed throughput - in MB/s | - unlike respective "total"s that are always in nanoseconds and bytes, | respectively | - units and naming: use "bytes" suffix for all ".size" metrics | (formerly: "mbytes") | - uptime is now "uptime" (formerly, "up_ms_time") | * Prometheus: add all help descriptions | * part six, prev. commit: c9016261399c5c61
  • "observability: Prometheus labels" c90162613 | * cover common (target, gateway) metrics | * amend labeled helps | * continued refactoring | * part five, prev. commit: 406669f6a044e55
  • "observability: Prometheus labels" 406669f6a | * initialize StatsD or Prometheus at the right time, and not before | * part four, prev. commit: d77b3d76276e34c
  • "observability: Prometheus labels" d77b3d762 | * labels for disks and backends | (the rest TBD) | * revise and refactor; reduce code | * part three, prev. commit: 21c1cd6979c25a
  • "observability: Prometheus labels" 21c1cd697 | * part one
  • "make Prometheus default; left and right helpers" 5dd9704f7 | * Prometheus is now default for local playground as well
  • "stats: add io-error metrics (new); register statically" e2c6e0dd8 | * statically register all error counters | * io error counters: GET, PUT, DELETE | * simplify and refactor | * TODO: CLI to utilize IsIOErrMetric
  • "stats: remove cold-read-write metric (obsolete) c8c187ef1 | * transitioned to using per-backend "total" latencies | * up cli
  • "observability: per-backend 'get' and 'put' metrics - part 2" 5fd101589 | new metrics | =========== | * <provider>.put.ns.total and <provider>.get.ns.total | * this is purely the time taken for AIStore to GET/PUT from/to a remote backend | * <provider>.e2e.put.ns.total and <provider>.e2e.get.ns.total | * total time of a GET/PUT request, respectively | * i.e., AIStore overhead + time to GET/PUT from/to a remote backend | * ver.change.n and ver.change.size per backend
  • "observability: per-backend 'get' and 'put' metrics (major)" 8a13453bd | * get-cold* and put-cold* are gone; <backend>.<verb> is the new convention | * backend to register their own metrics at construction time | * with substantial refactoring | * TODO: | - remote-del counter
  • "build-time choice: Prometheus (default) or StatsD" c01950002 | * statsValue primitive to contain only one respective label: | (StatsD or Prometheus) | * part four, prev. commit: 202f4cd2d3508f
  • "add docker image for authn container" b16b867f6
  • "build-time choice: Prometheus (default) or StatsD" 202f4cd2d | * part three, prev. commit: b334ff969211
  • "build-time choice: Prometheus (default) or StatsD" b334ff969 | * part two, prev. commit: 860c1369f059
  • "stats: physically separate Prometheus and StatsD; build and lint" 860c1369f | * new build tag: statsd | * update make and lint scripts and associated yaml | - add build and lint permutations | * extract common constants and helpers, reduce code duplication | * update docs: document statsd and other build tags | - remove AIS_PROMETHEUS environment
  • "extend 'show cluster' - add 'alert' column" 40d6580df | * move version & build to the summary | - show version (or build) with individual nodes iff there are different versions and builds, respectively | * add alert column; hide it iff all state flags are OK | - for enumeration, see cmn/cos/node_state_flags
  • "extend 'show cluster' - add 'alert' column" 40d6580df | * move version & build to the summary | - show version (or build) with individual nodes iff there are different versions and builds, respectively | * add alert column; hide it iff all state flags are OK | - for enumeration, see cmn/cos/node_state_flags
  • "Prometheus: skip zero value metrics when collecting" 9155d283d
  • "disk metrics; CLI: verbose counters, empty version" bd927bc01

Core: HTTPS; TLS

  • "add 'ais tls validate-certificates' command" 0a2f25cc7 | * also, add load-cert case for secondary proxies (fix)
  • "tls cert (re)loader: even more alerts" 3f3e50263 | * part eight, prev. commit: 55db4cb4638a41
    alert comment
    tls-cert-will-soon-expire warning: less than 3 days remains until X.509 cert expires
    tls-cert-expired red alert (as the name implies)
    tls-cert-invalid ditto
  • "tls cert (re)loader: (valid, invalid, expired) state; more alerts" 55db4cb46 | * part seven, prev. commit: 1fe3c80bec3f | * with refactoring
  • "tls cert (re)loader: raise/clear alerts; follow-up" 1fe3c80be | * part six, prev. commit: 7c338a2494f07edb
  • "[API change] show TLS certificate details; add top-level 'ais tls' command" 091f7b0e0 | * Go API: add api/x509 source: | - api.LoadX509Cert | - api.GetX509Info | * CLI: add cmd/cli/cli/x509; consolidate all TLS in there | * CLI: add top-level ais tls; update all related docs and references | * prev. commit sequence: 3f3e5026323deed | * separately: | - aistore as reverse-proxy is obsolete - update the docs, add | disclaimer | - related (very old) commit: 2cc82126b0625b2a1d
  • "tls cert (re)loader: document; consolidate all HTTPS related topics" 7c338a249 | * part five, prev. commit: 5e92eff58c06ae3
  • "tls cert (re)loader: add admin API and CLI" 5e92eff58 | * to reload unconditionally (skipping cert-changed check) | * part four, prev. commit: 21807df84a7fdee
  • "tls cert (re)loader: mtime and size; fingerprint; flex. scheduling" 21807df84 | * rename all internals; refactor | * extend the state - keep mtime and size | * schedule housekeeper based on remaining time | * remove "fingerprinting" | * part three, prev. commit: 31d1a799f7e547
  • "tls cert loader: rewrite from scratch; intra-cluster clients" 31d1a799f | * from scratch, prev. commit 8107a3bb10a51478 | - original ticket: [NGNSDS-632] | * separately, introduce intra-cluster to differentiate clients | - NewTLS(..., intra-cluster)

Core: Filesystem Health Checker (FSHC)

  • "FSHC v2: upgrade rc2 config to rc3" 49df73872 | * temp patch only to carry out interim upgrade within v3.24 | - (ref v324) | - (v3.23 => new fshc) and (3.24.rc2 => new fshc)
  • "Go API: HEAD(object) args; [config change]: IO errors" 64bf8b267 | * HEAD(object) API change, part one | * FSHC config change: IO error limit and duration | (formerly, soft error) | * with substantial refactoring
  • "FSHC v2: joggers to always check before running" 061a7c9ef | * related commit: ee3957913b5724
  • "assorted fixes: fs-linux; CLI iterations; FSHC; OOM periodic" 447115159 | * revise (mountpath => filesystem) resolution | - mountpath vs FS mountpoint; relative path | - refactor & cleanup; fix and add comments | * CLI: when iterating, perform aistore version check only the first time | - extend longRun singleton - add iteration count | * OOM periodic: flip CAS statement (typo) | * FSHC: flip CAS statement (ditto) | - refactor & cleanup | * FSHC config: reduce default soft-error limit to 10 (was 100)
  • "rescan disks; manually run FSHC (advanced use only)" 69ea283b8 | * two new admin APIs, CLI, and implementations
  • "never call FSHC with nil mountpath; reduce code" fdf575e20 | * also, remove erroneous assert
  • "FSHC v2" f24b5aa2f | * count (GET, PUT, DELETE) errors more precisely | - aka "soft IO errors" | * move fshc callbacks from target's put-object to lom | - create, rename | * part twelve, prev. commit: 43d88d22ae8c13
  • "FSHC v2: backward compatibility (config)" 11fa3cdd1
  • "follow-up" f242fd47e
  • "FSHC v2" 43d88d22a | * non-IO error correction | * add EBADFD, ECANCELED, os.NewSyscallError | * part eleven, prev. commit: a2d04da3a67ad
  • "[config change]: FSHC v2 (major)" a2d04da3a | * track and handle total number of soft errors | * extend fshc config; add new knobs | * revise health/fshc.go logic | * part ten, prev. commit: ee3957913b57242
  • "filesystem health checker (fshc) version 2" ee3957913 | * part nine, prev. commit: e0a312cd9fbac
  • "filesystem health checker (fshc) version 2" e0a312cd9 | * add 'err-mountpath-changed-at-runtime' | - list-objects will now detect it and trigger FSHC | - TODO: consider making the check inside Get() and GetAvail() | * part eight, prev. commit: 9ba4f97dee926
  • "filesystem health checker (fshc) version 2" 9ba4f97de | * fshc: resolve filesystem, compare IDs | * fshc: refactor 'run' method | * fs: amend fs.Equal | * core: lop.open to check bucket directory and possibly escalate | * ais: retry GET only when erasure-coded | * CLI: add format-bucket-name; cleanup | * part seven, prev. commit: f465a1a910a6faa
  • "filesystem health checker (fshc) version 2" f465a1a91 | * fs: now responsible to trigger FSHC - directly and in place | * fshc: additionally check fstat and statfs | * periodic target-stats: disk stats, with and without cap-refresh | * part six, prev. commit: 1ed2c1b69f1
  • "filesystem health checker (fshc) version 2" 1ed2c1b69 | * at runtime: resolve (mpath, FS) to disks, and handle: | - no disks | - disk loss | - new disk attachments | * part five, prev. commit: ccef8082e95794
  • "filesystem health checker (fshc) version 2" ccef8082e | * CLI alerts: | - add 'ais storage disk' | - amend 'ais storage mountpath' | * with refactoring | * part four, prev. commit: bda7bc9901ed73
  • "filesystem health checker (fshc) version 2" bda7bc990 | * CLI 'storage mountpath' to show alerts | * part three, prev. commit: 0319347b451e
  • "filesystem health checker (fshc) version 2" 0319347b4 | * add disk-fault alert | * <DISK NAME>[<alert>] convention, with suffix enumeration in fs/api.go | * part two
  • "filesystem health checker (FSHC) version 2" 733688e06 | * full rewrite | * part one

Core: Keep-Alive; Primary Election

  • "amend intra-cluster health ping" b550c0b2b | * w/ comments inline | * related: e3503fc67c38e02
  • "(new) keep-alive error counter & keep-alive alert" 4e027cff4 | * add keep-alive error counter and alert, respectively: | - "err.kalive.n" | - "keep-alive-errors" | * keep-alive alert clears after 5 minutes (default) of no-change | * separately, bump minor versions: | - aisloader, cli, authn
  • "keep-alive (follow-up)" 38c031c11 | * slow-keepalive: simplify-out check for DNS error | - related: a8a5c1e342a99 | * cold-GET: amend cleanup logic | - don't uncache (no need) | - don't remove copies (not produced yet) | - shorten EPIPE error message | * with authn enabled, 401/403 codes may be happening much more frequently | - with the potential to quickly generate megabytes of log records | - thus, making an exception | * on the related note, proxies must also count (GET, PUT, DELETE) errors
  • "retry primary keepalive (part four)" 89d06dbca | * with refactoring | * prev. commits: 9c961d591141, e3503fc67c38e | * separately, CLI 'ais log get' inline help
  • "amend primary election (part three)" 9c961d591 | * node => current-primary retry via pub-addr, if different | * with refactoring; logs | * part three, prev. commit: e3503fc67c38
  • "retry primary keepalive (part two)" e3503fc67 | * primary => node via palive.retry via pub addr, if different | * with refactoring | * part two, prev. commit: a8a5c1e342a99
  • "retry slow keepalive upon DNS lookup failure, given" a8a5c1e34 | * different control and pub hostnames

Core: Rebalance; Erasure Coding: Intra-Cluster streams

  • "close EC streams when idle, reopen on demand" b471cb1d6 | * gateways: remove open-ec-streams logic from bucket initialization | (no need)
  • "close EC streams when idle, reopen on demand (major)" 5eb467789 | * remove entire code tos (statically) open streams based on BMD | * upon inactivity timeout go ahead and close EC streams | * part four, prev. commit: 0642c8572832e
  • "close EC streams when idle, reopen on demand" 0642c8572 | * refactor and amend housekeeper | - use UnregInterval consistently across | - add UnregIf | - reduce work chan capacity to 48 (was 512); add "channel full" check | * target: implement on-EC/off-EC handler | - TODO: revisit 1m delay | * part three, prev. commit: 11148689394e
  • "rebalance vs dynamic EC streams; housekeeping; dsort; downloader" a91636bc3 | * open/close and ref-count EC streams when rebalancing | * consolidate common housekeeping durations | - xactions | - notifications | - transactions | * dsort & downloader: housekeep upon the first respective usage | * with substantial refactoring
  • "[config change] close EC streams when idle, reopen on demand (major)" 111486893 | * cluster config: add "ec_streams_time" | * proxy: onEC when initializing bucket | - s3 and, separately, native API | * new sources: ais/prxec and ais/tgtec | - add target /v1/ec endpint | * refactor; reduce copy/paste; remove unused code | * miscellaneous micro-optimizations | * part two, prev. commit: d8a71bb59fb0a18d
  • "close/reopen EC (intra-cluster) streams on demand (major)" d8a71bb59 | * ref-count EC xactions (jobs) | - incActive and notification callback | * EC active/inactive state in now cluster-wide information; works as | follows: | * piggyback on keep-alive heartbeats | - target => (fastKalive) => primary | - primary => (fastKalive response) => non-primary | * part one

Core: List Virtual Directories

  • "list-objects: sort virtual dirs first, objects second" b630043f7 | * part four, prev. commit: da0606fa17eec
  • "list-objects: amend listing virtual ('synthetic') dirs" da0606fa1 | * aws and gcp backends to handle virtual dirs, set 'is-dir' bit | * (azure TBD) | * always return virtual directories (if any) - unless | explicitly disallowed via '--no-dirs' switch | * move '--no-dirs' logic to the backends | * part three, prev. commits: 0b14d0ec37b10, a9773251e7f78
  • "S3: list-objects to return all virtual subdirectories" 0b14d0ec3 | * when listing with apc.LsNoRecursion flag (CLI --non-recursive)
  • "list-objects: skip virtual directories" a9773251e | * new bit flag in the control message: LsNoDirs | * CLI as well | * prev. commit: 02843305c19ce63
  • "list-objects: skip virtual directories" 02843305c | * but only when using listed results to allocate LOMs (e.g., copy, prefetch)

Core: API changes; Config changes

  • "[API change] show TLS certificate details; add top-level 'ais tls' command" 091f7b0e0 | * Go API: add api/x509 source: | - api.LoadX509Cert | - api.GetX509Info
  • "transport header & burst size can now be set at runtime" 0a54545e1 | * intra-cluster transport: the two knobs were readonly | - not anymore | * with refactoring
  • "[API change]: extend HEAD(object) to check remote metadata" c1004dd2b | * add support for QparamLatestVer ("latest-ver") | - HEAD is now similar to GET(object) | * when checking local/remote equality, return specific cause: | what exactly failed to match | * part two, prev. commit: 64bf8b26721a90
  • "Go API: HEAD(object) args; [config change]: IO errors" 64bf8b267 | * HEAD(object) API change, part one | * FSHC config change: IO error limit and duration | (formerly, soft error) | * with substantial refactoring
  • "new admin API: disable/enable cloud backend at runtime" 779a7b9f2 | * still remains CLI and docs | * and also removing: | 'ais config cluster backend.conf='{"gcp":{}, "aws":{}}', and similar
  • "add Go API to query configured backend providers" dac041bbe | * (the corresponding fields in the cluster config are hidden)
  • "[config change]: FSHC v2 (major)" a2d04da3a | * track and handle total number of soft errors | * extend fshc config; add new knobs | * revise health/fshc.go logic | * part ten, prev. commit: ee3957913b57242
  • "[API change] do not accept node URL - always require node ID" 482e720f3 | * up cli
  • "[API change] do not accept node URL - always require node ID" 071ddea92 | * when not in cluster map, validate via "self-removed" history | * (security)
  • "Go API: add EnableRebalance and DisableRebalance" f5deb20a2
  • "(config, log): use 'log.stats_time' if defined" 053ce1175
  • "[Go API change] new field BaseNameOnly in ArchiveMultiObj API" 70165c8d3 | this boolean field specifies only extracting base names as names of archived | objects; dsort must recognize the record keys correctly
  • "[Go API change] new get-stats to show disk IOPS and capacity, both" 5cdab34d6 | * Go API: add get-any-stats | * CLI: ais storage disk & performance disk to include (used%, avail) capacity | * separately, fix scripts/install_from_binaries.sh
  • "[API change] get-bucket-info to support prefix option" 56935d414 | * e.g.: 'ais ls ais://nnn --summary --prefix=aaa/bbb'
  • "mark Conf field in BackendConf as not marshalable" 5507ee016

Core: Performance Optimization; Bug fixes; Improvements

  • "fix log-removal regression" 7e928e53f | * when total size exceeds configured maximum | * with refactoring
  • "datapath query (dpq)" 8c3b9438e, fea2b63ba | * remove rendundant constants - use s3 header prefix instead | * remove debug assert | * error message to include raw query
  • "add docker image for AIS utilities" e9827b983 | - rename admin to ais-util | - update packages and optimized layers
  • "new RMD not to trigger rebalance when disabled in the config" 550cade20 | * several distinct scenarios: by user, RMD with action message, | RMD without message (ref) | * extra check in the latter case | * still trusting local copy of the cluster config, though
  • "add support for GCP and AWS backends in Google Colab" e53aaf35a
  • "target startup: configured backends vs linked backends" 056d4f9cc | * failure to initialize a real (non-mock) backend - is fatal | * with minor refactoring for clarity
  • "cleanup fs-path-error only in API responses" 6cfa2754e | * and not anywhere else
  • "python, CLI, EC (follow-up)" e3b646378 | * python: prefetch w/ num-workers | * up cli | * close/reopen EC streams: negative timeout (fix) | * list-range xactions: minor ref
  • "prefetch/copy/transform: number of concurrent workers" a5a30247d, 8aa832619 | * prefetch (job): | - extend apc.PrefetchMsg control: add num-workers | * CLI ais prefetch: add '--num-workers' option | * copy-objects/transform-objects (jobs): | - extend apc.TCOMsg: add num-workers | * amend & revise common list-range iterator (lrit) | * with refactoring
  • "delete/evict objects: fix overcounting io errors" 89acff646 | * (evict-objects & not-in-cluster) - is a valid combination | * related config: fshc.io_err_limit | * related commit: e2c6e0dd877e45d71
  • "[gc logs] compute total size in a housekeeping callback" 350bbff8a | * use a separate goroutine if and only if exceeded configured limit | * (micro-optimizations)
  • "build rc4; fixes" 8fd68450c | * v3.24.rc4 | * rewrite cos.SaveReader and friends | * universally use cos.Remove | * introduce err-bdir | * log open/close-ec-streams on both sides
  • "add write-xid (micro-optimizations)" 68858f950
  • "housekeeper to pass monotime to a callback" b17f42234 | * with minor cleanup, micro-opt
  • "superfluous response; non-existence; CLI ec-encode v2" 713920710 | * fix "superfluous response" from HEAD(bucket) when given invalid URL | * fix fs/walk vs concurrent object deletion | - non-existence (condition) includes missing-metadata | * CLI: revise/rewrite 'ais start ec-encode' and 'ais start mirror' | * with refactoring and renaming
  • "intra-cluster notifications: reduce locking, mem allocations" b7965b7be | * micro-optimize
  • "cmn: CopyProps for use externally (K8s operator)" 361595619
  • "micro-optimize obj props to http headers conversion" 212d2f72f | * target obj-head handler: use pre-mapped headers, avoid repetitive churn | * api/apc: fuse textproto logic; optimize and simplify | * part two, prev. commit: 530288d44fa5fe26
  • "micro-optimize obj props to http headers conversion" 530288d44 | 1. canonicalize all header constants | 2. add static map: [internal prop name => canonical header name]
  • "follow-up: amend log" 75309301c | * amend log: ec & transport/bundle packages
  • "ios: fix handling of devices with empty physical_block_size" f8fd327f3, 595e26261 | * prev. commit: 7324344cd3c
  • "global rebalance vs targets that are being decommissioned" 56f7347a4 | from the rebalancing perspective, a target node that is in maintenance mode or | that is being decommissioned must still be considered "active" unless | this target has already reached post-rebalancing (SnodeMaintPostReb) state
  • "follow-up: rebalance; archive" 727e45da9 | * reb: log header by strings builder | * archive: refactor 5c94da9ceb043
  • "transport header & burst size can now be set at runtime" 0a54545e1 | * intra-cluster transport: the two knobs were readonly | - not anymore | * with refactoring
  • "follow-up: remove changes for TLS client" 6c0979615
  • "[NGNSDS-632] TLS support" 8107a3bb1
  • "ios: remove lsblk cmd, add sysfs-block parsing logic instead" 19c1041df | * this removes the last linux command (executable) that aistore itself | used to run | * with refactoring
  • "(intra-cluster transport, rebalance): channels, log, more log, refactoring" 9dcbfad7d | * make the transport module's verbosity settable at runtime | - redo most of the verbose logging | * transport/stream collector: | - increase chan size | - periodically dump idle streams, if any | * reb and transport/bundle modules | - add begin/end log records | * add open/close/abort log records | * LZ4 compressed stream (state) is now a pointer | * add yet another scripted test (target IDs hardcoded) | * with minor refactoring
  • "API: fix panic on setting query parameters" fbcb90382
  • "'uname' is a pointer" 681e4c497 | * fix lcache re-caching | * regression: e6045456d449c
  • "add SECURITY.md to outline security policy and supported versions" 479b4ca68
  • "logger: micro-optimize time stamping" 455872f67
  • "aisloader: minor fix" 5499e584b
  • "recognize DNS lookup error" cec2e83c0 | * and retry, if need be
  • "CIDR to select public IP upon node's startup" e685b402a | * new env var AIS_PUBLIC_IP_CIDR | if defined, will take precedence over AIS_CLUSTER_CIDR | * for comments, see see api/env/ais | * part two, prev. commit: 8defcb378bb508
  • "reuse local-redirect CIDR to select public IP (to listen on)" 8defcb378 | * at node's startup, if its config.HostNet.Hostname is empty: | - list local unicast IPs; | - if there's more than one: use local-redirect CIDR to make the selection. | * in effect, reuse local-redirect CIDR for the second purpose | * with refactoring and comments inline
  • "mountpath joggers for archive jobs" b7d076534
  • "archive multi-object: create a shard when" 9b3513273 | * when doesn't exist, even when control-message.append is true
  • "feat: add exclude reaction type upon finding missing extensions" 2d4b78bfc | * refactor structure and logic of MissExtReact | * implement exclude action that removes any incomplete sample if it doesn't contain all required extensions it also removes all unnecessary extension files
  • "strings: rather right; minor" d5b75b650
  • "fix maximum-total-log-size handling" 6ffcadab0 | * log names do not contain ".log." anymore
  • "micro-optimize multihoming; post-initialize cluster map" 8a2e7a3b7 | * add snode net-namer with two implementations: (single, multi) | * perform residual init on each new cluster map instance
  • "ais: fix node-join return values to avoid panic" 21cce483f
  • "ais: properly start listening on all extra pub interfaces" dc551e6c6
  • "metasync: amend GFN notifications" abffbe3d4 | * num connection-refused retries: sync vs notify | * metasync-notify: never reset handle-pending timer | * add err-work-channel-full, and use it | * with minor refactoring | * part two, prev. commit: 1a01903358fcb0
  • "metasync: amend GFN notifications" 1a0190335 | * always notify via metasync-post | * add extra checks when not to
  • "fix: always update black to latest version in fmt-fix and fmt-check" 162dfef6d
  • "s3: presigned HEAD request" f6fcd7c3f | * do not read HEAD resp. body | * with refactoring | * prev. commit: 5efbfcd8626d75
  • "don't use io.read-all (micro-optimization)" 470f0379f | * part two | * HEAD request vs Content-Length; comments | * refactor: presigned s3; k8s client
  • "don't use io.read-all (micro-optimization)" 6c8b7c0f0 | * add cos.ReadAll and cos.ReadAllN | * with minor refactoring (htrun)
  • "use fixed size arrays (ref)" 2c7b1d928
  • "dpq parsing: max num iterations (minor)" 1fe096c8b
  • "datapath query parameter (dpq) parsing" 2edb03400 | * debug or no-debug: keys must be known or excepted
  • "follow up" e84c966c2
  • "general: fix DisableColdGET feature and add tests" cf7933392
  • "follow-up" e38e68810
  • "refactor: use goroutines to execute archive API calls asynchronously" 25b7cd8c6
  • "follow-up (fspaths)" 6a960950e
  • "ais: pass original request to HeadObj" 5efbfcd86
  • "expose raw get/put latency metrics" d0fa2aca6
  • "feat: used cos.ParsedTemplate to parse and generate shard's name" f0d3b6ff6
  • "GOMAXPROCS, et al." e2a68fa77 | * revisit, add comments | * with minor refactoring
  • "fix: Handle edge case of remainder records after sharding" 4ec9c9046
  • "fix: remove wait for Object.promote on synchronous execution" f7c581daa
  • "misc: remove statsd from aisnode image" d259b8b44
  • "disable/enable cloud backend at runtime" acdfb0398 | * two-phase commit | * CLI 'ais advanced [enable/disable]' | * part two, prev. commit: 779a7b9f201e
  • "misc: cleanup prod K8s container images" e3f964685
  • "micro-optimize 'lmeta.unpack'" b488be001
  • "aisloader: add '--list-dirs' option to list virtual subdirectories" c69d58af9
  • "new object metadata type: chunk" 0d915b0f1 | * cos.UnsafeS, cos.UnsafeB | * with refactoring | * part one | * related commits: 397e1e2c, 57b94581, 3816bfbf
  • "further isolate access to LOM internals; fstat" 397e1e2cc | * prev. commit: f882ef45732cc
  • "return writer, not file; EC restore-replica; fast append" 57b945817 | * with partial rewrite: | - GET => EC restore-replica | - fast append to TAR | * tests: append to arch: more stress | * part two, prev. commit: 887bb0544e6497
  • "append to TAR: remove redundant fseek, simplify" 51a22738e
  • "return writer, not file; add create-part, create-slice" 887bb0544 | * part one, related commit: 3816bfbf82094
  • "etl: simplify and refactor 'inline | offline' transforms" cdbb82801 | * push, redirect, and reverse
  • "return object reader, not file" 3816bfbf8 | * cos.LomReader
  • "further isolate access to LOM internals" f882ef457 | * lom.FQN
  • "BID bitwise structure and flags (object metadata)" cf211aba7 | * part three, prev. commit: 8e961d76473699
  • "BID bitwise structure and flags (object metadata)" 8e961d764 | * part two, prev. commit: 054fecdc94fadc
  • "BID bitwise structure and flags (object metadata)" 054fecdc9 | * part one
  • "ios: replace du with raw syscalls" 15c20536c
  • "storage and bucket summary: move and parallelize on-disk sizing" 2ad584561 | * remove fs.OnDiskSize ('du') from the job's BEGIN phase | * run it in parallel with walking objects and, possibly, counting | remotes | * refactor newSumm construction
  • "object 'hrw-fqn' is now a pointer" 48b5f2a9b | * part three, prev. commit: e6045456d449cf
  • "object 'uname' is now a pointer" e6045456d | * part two, prev. commit: f337216fc28b7
  • "idle job timeout is now atomic; version is now a pointer (part two)" 7896e263d
  • "object version is now a pointer" f337216fc | * [backward compatibility] was a string
  • "rename avail-paths (minor, ref)" f25f3042d
  • "ios: remove unused functions" d40b97304
  • "bucket summary: fix begin timeout" bc1ee691f
  • "fs: replace running sh, df and awk with /proc/mounts read" 7c4dd1c3a
  • "ext/dload: remove unused returns (minor)" 86939a750

Initial Sharding (ishard); Distributed Shuffle (dsort)

  • "dsort: rename order_file to EKM and improve EKM file parsing logic" ccbeefe27 | * the term order_file was misleading, as it suggested functionality related to "ordering," | * whereas its purpose is only to provide rules for categorizing source records without | * any specific order. Renaming it to EKM clarifies its role and makes the code and API spec more intuitive. | * enhance EKM file parsing logic by removing the reliance on file extensions | (new logic now auto-detects the file type by first attempting to parse it as JSON, | and then falls back to line-based parsing if fails).
  • "support and validate template format strings in dsort and ishard EKM" 072294682 | * stats/target log to always log red | * flags string formatting: | - e.g. single: "OOS", multiple: "[OOS OOM]" | * part two, prev. commit: 40d6580df689370
  • "fix: adjust minimum file size check for compressed archive types" 5c94da9ce | - when a file is smaller than the block size and padded with zeros, compression can remove the padding, | - resulting in a file size less than the block size. | - for compressed archive types, the check now only ensures the size is sufficient to detect the | - corresponding magic numbers, rather than strictly adhering to block size.
  • "support count-based shard_size config in ishard" 767dbb660 | * rename all max_shard_size to shard_size | * refractor isharder archive logic to keep current shard size/count as | internal state
  • "exclude samples not specified in ishard EKM config" fb8296e12
  • "enable ishard to use regex-based external key map through dsort" 5580f0603
  • "support regex in dsort external key map (EKM)" d78e4b4fa | * supports using regex as the record identifier to match multiple records into a pattern flexibly | * ishard needs this feature to use prefix as regex to pack all records under the same virtual directory into a pattern
  • "display ishard effective total object size after applying missing ext react" 3768b73f2 | * rename MissingExtAction to MissingExtManager to reflect its role in managing all information about extensions, including their effective object size. | * display the re-calculated effective object size in a progress bar.
  • "logs: replace nlog with fmt for ishard stdout" 0bbdc869d | * nlog shouldn't be used for just printing info to stdout/stderr. | replaced with fmt instead. | * added error handling in archive goroutines
  • "integrate dsort with ishard" 25495d8b1 | * support alphanumeric/shuffle algorithms and associated configs | * replace all log with nlog when parsing CLI params | remains: | - support content sort and external key map | - enable dsort dry-run preview from ishard
  • "enable IEC, SI formats for max_shard_size config in ishard" 661c8a73f
  • "add dry-run option for ishard with expected shards layout" d990ada77
  • "implement configurable record key and report missing extensions" a1b58cf47
  • "add progress bar for ishard execution" 01f9b3423
  • "enable ishard prefix option for specifying source files to include" a2a0445d4
  • "restructure ishard package as a standalone executable" 93a80736e
  • "(new) ishard utility to archive objects according to subdirectory paths" 252f9526a
  • "dsort: fix placement of error check" 5802f1151
  • "dsort(EKM): implemented ExternalKeyMap as shards format option" 49b0668fc
  • "dsort: supported algorithms (alpha, shuffle, content key) in DsortFramework" 308cd2c7c
  • "dsort: implemented and tested python datatype DsortFramework in SDK" 6d3aa4b6c

Authentication; Access Control

  • "CLI: update token filepath handling, directory creation, and error management" 60b137ffc | - rename function tokfile to getTokenFilePath for better clarity. | - rename instances of tokenFile in logoutUserHandler, loginUserHandler, and revokeTokenHandler | with tokenFilePath for better clarity. | - replace os.Create with os.CreateDir and revert to returning the token filepath and err in getTokenFilePath. | - streamline error handling for getting token file path. | - replace os.Remove with cos.RemoveFile in logoutUserHandler.
  • "CLI: create tokenfile if absent during login" 759cb7f2a
  • "fix proxy access check" ff86daefa
  • "show-cluster is a cluster-level operation (fix)" 08c038235
  • "add Show CLuster as Cluster-Level Op" 049a3a300
  • "log user-name for failed operations" d0510d879
  • "authn config: add json tag to unexported fields" 622cca17d | * cli
  • "authn config: add json tag to unexported fields" e24c869c5 | * fields previously added in this commit: 73682b80d403
  • "add default config to docker image" 0ef7fcd3f
  • "override configuration with environment variables for server" bc10234c7
  • "update authn container entrypoint to use new config path" 0259869ab
  • "add support for env vars for admin creds and secret" d50c2237a | - add support for the following environment variables: | * AIS_AUTHN_SECRET_KEY: Secret key for token signing | * AIS_AUTHN_SU_NAME: Admin username | * AIS_AUTHN_SU_PASS: Admin password | - documentation updated accordingly
  • "refactor, maintain" 73682b80d | * config: remove rlock; use pointers
  • "refactor User entity and remove cluster ID from LoginMsg" 658058e26 | - remove roles string[] from the User entity to simplify role management. | - remove cluster ID from the login message to streamline the login process.

CLI

  • "add 'AIS_AUTHN_TOKEN' environment" 79543fd6a | * add AIS_AUTHN_TOKEN (value) env variable | - not to confuse with AIS_AUTHN_TOKEN_FILE | * refactor api.LoadToken
  • "add parse-retries-flag to reuse" fcaba7c85
  • "ais put <multiple-files> will now generate a list of failures" 01e6e484f | * e.g. Error: failed to PUT 13 files ("/tmp/.ais-put-failures.3343698.log")
  • "'ais put --retries ' with increasing timeout, if need be" 99b7a961a
  • "when 'ls bucket/objname' becomes 'ls bucket --prefix objname'" e3febd2a4
  • "remove 'cluster restart required' warning for auth.enabled config" 6299e3313, 4435c92af
  • "add 'ais tls validate-certificates' command" 0a2f25cc7 | * also, add load-cert case for secondary proxies (fix)
  • "follow-up" 508b8add5 | * 'ais cluster set-primary --force' | * up cli
  • "[API change] show TLS certificate details; add top-level 'ais tls' command" 091f7b0e0 | * Go API: add api/x509 source: | - api.LoadX509Cert | - api.GetX509Info | * CLI: add cmd/cli/cli/x509; consolidate all TLS in there | * CLI: add top-level ais tls; update all related docs and references | * prev. commit sequence: 3f3e5026323deed | * separately: | - aistore as reverse-proxy is obsolete - update the docs, add | disclaimer | - related (very old) commit: 2cc82126b0625b2a1d
  • "when command usage is multi-line; amend and refactor cli/docs" b6653a2f8 | * const Usage (refactor all sources) | * combine ais cp help and documentation; fix all cross-refs | * docs: add '--num-workers'
  • "disable/enable cloud backend at runtime" acdfb0398 | * two-phase commit | * CLI 'ais advanced [enable/disable]' | * part two, prev. commit: 779a7b9f201e
  • "copy/transform: number of concurrent workers" 2414c6898 | * CLI ais cp : add '--num-workers' option | * CLI ais etl: ditto | * assorted: | - github-CI python | - tests: skip ec-destroy-bucket | - fs-path-error: empty path is now treated as not
  • "user-friendly "did you mean" message" d015ea5e6 | * erroneous 'ais show gs://abc[/object]' will now produce the right hint | * up cli
  • "fix rendering issues for EC xactions output" 94578e9f0
  • "extend 'show cluster' - add 'alert' column" 40d6580df | * move version & build to the summary | - show version (or build) with individual nodes iff there are different versions and builds, respectively | * add alert column; hide it iff all state flags are OK | - for enumeration, see cmn/cos/node_state_flags
  • "tls config validation - make it a warning" 88902b55b
  • "assorted fixes (minor)" 8f33765ee | * red NetworkError | * storage summary: rm wrong assert | * inline tips | add rule to CI that triggers python SDK AuthN tests when there are changes made to relevant files or tests.
  • "an option to show calendar date and hh:mm:ss timestamp, both" 6702cfa73 | * e.g.: | - $ ais show job --date-time --all | - $ ais show rebalance --date-time
  • "CLI/AuthN: update token filepath handling, directory creation, and error management" 60b137ffc | - rename function tokfile to getTokenFilePath for better clarity. | - rename instances of tokenFile in logoutUserHandler, loginUserHandler, and revokeTokenHandler | with tokenFilePath for better clarity. | - replace os.Create with os.CreateDir and revert to returning the token filepath and err in getTokenFilePath. | - streamline error handling for getting token file path. | - replace os.Remove with cos.RemoveFile in logoutUserHandler.
  • "CLI/AuthN: create tokenfile if absent during login" 759cb7f2a
  • "tls config validation; user-friendly error messages; config reset" 5f9454a08, 83a186baa | * all of the above, plus: | - do not initialize TLS client unless required | - and vice versa
  • "use Go API to query configured backend providers" 756a653ae
  • "CLI (follow-up)" 3fdb3e6c1 | * CLI: warn mountpath with no disks (and labels) | * CLI e2e: need to wait longer when cluster has a lot of data | * stats/target: fix error message typo
  • "assorted fixes: fs-linux; CLI iterations; FSHC; OOM periodic" 447115159 | * revise (mountpath => filesystem) resolution | - mountpath vs FS mountpoint; relative path | - refactor & cleanup; fix and add comments | * CLI: when iterating, perform aistore version check only the first time | - extend longRun singleton - add iteration count | * OOM periodic: flip CAS statement (typo) | * FSHC: flip CAS statement (ditto) | - refactor & cleanup | * FSHC config: reduce default soft-error limit to 10 (was 100)
  • "show configured backend providers" 663d98975 | * ref ba492a11a580d2e | * up cli
  • "show configured backend providers" ba492a11a | * config.backend section is hidden - still, | show respective completion and minimal content
  • "disk metrics; CLI: verbose counters, empty version" bd927bc01 | * do not build disk metric names at runtime | * CLI: skip internal (lcache, stream) counters unless verbose | * CLI: version check vs. nodes in maintenance
  • "support per-backend cumulative "total" latencies | * revise 'ais performance latency' | * use .total. latencies and their respective counters | * related commit: 5fd101589c7
  • "'show performance'" 0609b040c | * remove redundant alias
  • "update authn entities and templates" c0db9fa0b
  • "list-objects: color virtual dirs" 592f63bfd | * and show nothing in the "cached" column

Python: SDK (AIStore, AuthN); PyTorch DataLoader; Tools

  • "sdk/python: release version 1.8.0" a8dd990ab
  • "sdk/python: refactor internal object classes for accessing and iterating over object content" 5ae897a0d
  • "sdk/python: refactor module structure" 0fc6e98b4
  • "sdk/python: add support for 'AIS_AUTHN_TOKEN' env var in SDK Client; bump version to v1.7.3" 49d772163
  • "sdk/python: release version 1.7.2" 32ececc3c
  • "sdk/python: memory usage optimization for ObjectFile" e804411fa
  • "sdk/python: add example object-file stress test" 78dd6c6f5
  • "sdk/python: release version 1.7.1" c2cebdf76
  • "python: fix date parsing by ensuring timezone-aware datetime objects (github-CI)" 9baf63aef | - ensure all datetime objects are timezone-aware in UTC | - fix date parsing issues encountered in github-CI
  • "sdk/python: object file max_resume per object file" 3eaf228d3
  • "sdk/python: objectfile patches (tests + resume logic)" 9803bf97f
  • "sdk/python: object group num_workers" c0faf87b6
  • "sdk/python: logging changes (decouple log config from package)" c95a9fa4b
  • "sdk/python: release version 1.7.0" 04261c497
  • "sdk/python: ObjectFile (File-Like Object)" 6a9c9fc76 | - ObjectFile (file-like object extending BufferedIOBase) with support for retries and error recovery, | including a notebook demo tests. | - iter_from_position in ObjectReader, which returns an iterator over each chunk of bytes in the object | starting from the specificied byte position, including tests. | - add integration tests for ObjectReader. | - update Python SDK documentation (and fixes for minor related issues in AuthN documentation generation).
  • "sdk/python: add extensive testing for AuthN module" 901379124 | - add individual tests for each permission (and derived role) using the Python SDK.
  • "sdk/python: add Python SDK AuthN README & update docs" 232622455 | - add README.md for Python SDK AuthN sub-package. | - update make generate-sdk-docs recipe (/python/Makefile) to include aistore/sdk/authn. | - update docs via generate-sdk-docs.
  • "python/authn: remove unused (derived) roles" 4d459b8b1 | - remove unsupported (only internally used) derived roles in AccessAttr class.
  • "sdk/python: pool request sessions across python processes" 06b3a29e9 | add a dict of request sessions that is indexed by process ID, thus removing any modification of the source clients.
  • "python: release sdk v1.6.0" 9e7655dfb
  • "python/authn: AuthN Error Handling" cb4ea3259 | - Makes error handler method (raise_ais_error or raise_authn_error with raise_ais_error as default) | a parameter of RequestClient and modified both Client and AuthNClient to initialize | RequestClient with proper error handler. | - Separates errors and handling by package | - Minimally changes client-side usage (usage of AuthNClient and Client remains the same, | only RequestClient usage changes but rarely used by user, only internally by AuthNClient and Client).
  • "python/authn: add AuthN Client Logout" 4a4b131dd
  • "python/auth: add Tokens API" 100bf6b78
  • "sdk/python: refactor request client to move session specific properties to SessionManager" e743e01f1
  • "python/authn: add Users API" 604e60a9e | add the Users API to the authentication module, enabling the management of role-assigned users. | - add APIs to create, update, delete, get, and list users. | - add appropriate unit and integration tests.
  • "python/pytorch: Implement shuffling and custom saturation factor for dynamic sampling" d41d87e6c | - shuffling with the Dataloader doesn't work when using a custom sampler, so we implement it as part of our | - dynamic sampler. This shuffling works by generating a random list of indices using permutation. | - also, support user-provided saturation factors.
  • "python/pytorch: fix WorkerSessionManager returning None when using samplers with no workers" 84b335aa4 | when using a dynamic batch sampler without a dataloader (e.g., to just get the batch indices), | we trigger a bug where session manager returns None.
  • "sdk/python: add retry support via urllib3.Retry" ad78fa1d1
  • "sdk/python: fix type hint for python 3.8 compatibility" 18dc104a7 | - fix type hints for Python 3.8 compatibility (replace tuple, list, dict with Tuple, List, Dict from typing)
  • "python/authn: implement Roles API" b15826eab | add the Roles API to the authentication module, enabling role-based access control features. | - add APIs to create, update, delete, get, and list roles. | - add Unit and Integration tests.
  • "sdk/python: refactor object get to move request logic to object reader" 9562ed458
  • "sdk/python: implement prefixes for object groups" be347a692 | add prefix support to ObjectGroup which is needed in the AIS Pytorch datasets.
  • "python/pytorch: resnet50 using WebDataset" fb8757248 | an example for WebDataset training using AISShardReader and existing torch models.
  • "python/pytorch: use Tuple instead of tuple to support older python versions" 5f909a664 | python versions 3.8 and older cannot use tuple as a type directly; instead, we must import Tuple from typing.
  • "sdk/python: set custom obj props" a6afbb8e2
  • "python/authn: add cluster operations - Implemented methods for listing, retrieving, registering" 98b4be02a | - listing, retrieving, registering, updating, and deleting clusters
  • "python: fix env var for TLS" 0698ac2b3
  • "python: enhance client to accept tokens and implement authn login" 4a1949a7a | - update the AIStore Python client to accept authorization tokens. | - add authn login functionality to enable users to log in and obtain tokens using their credentials. | - unit + integration tests for AuthN in Python
  • "sdk/python: fix remote tests to avoid concurrent object access failures" be7ca27b3
  • "sdk/python: update release version" 227652334
  • "python/pytorch: solidify objects as the backing data structure type" 2bf96fb53
  • "sdk/python: support object props for object head" 010b80d1e
  • "python/pytorch: fix length for iterable datasets" c06c0a565
  • "python/pytorch: decode shards on client side in ShardReader and support non-uniform samples" 2120562e8
  • "python/pytorch: create classifier model training example" bcc1f613c
  • "python/pytorch: implement dynamic sampler for map based datasets" aaec09d4f
  • "python/pytorch: add multiple worker support to ShardReader" a553c2861
  • "python/pytorch: add progress bar to iter dataset with support for workers" 40c624bff
  • "python/s3compat: update certifi dependency" b86102eac
  • "python/pytorch: integrate alive_progress into ShardReader" ac71ef7d8
  • "python/pytorch: add support for multiple workers in iter datasets using worker slices" b1a5afd8b
  • "python/pytorch: update examples for pytorch datasets" b9b781e30
  • "python/pytorch: improve error handling for datasets and remove unused Client" 8572b8ef0
  • "python/pytorch: refactor datasets and utils" cef15ae7c
  • "sdk/python: modify Object.promote to return job ID for status check" 575a04296
  • "python/pytorch: add wrapper for parse_url to fix upstream torch imports" b628816e8
  • "pyaisloader: enable ETL option for pyaisloader benchmarks" 0732eb990
  • "sdk/python: implemented and tested prefix support in bucket summary and bucket info methods" abc0300c3
  • "python/pytorch: Calculate length in iter to save additional iteration cost" bf12042f6
  • "python: include STATUS_PARTIAL_CONTENT status code handling in bucket summary" 2800543fb
  • "python/pytorch: add ShardReader example to docs" 68a5a958c
  • "python/pytorch: implement WebDataset shard reader and tests" 1be385bc7
  • "python/pytorch: Refactor datasets into separate files" 68e98f41f
  • "pyaisloader: performance benchmark tests for AISDataset and AISIterDataset" 14cddfec2
  • "sdk/python: add Object.append_content method" 7699758f1
  • "python/pytorch: fix regression from 6904" 5429a7fbd
  • "python: release python sdk v1.4.23" 63bfb4253
  • "python: add functionality to fetch objects by URL - Implemented fetch_object_from_url function in the Client class to enable object retrieval using a URL." 7cbc80f17
  • "python: correct return type for raw stream in ObjectReader" 198935a30 | change the return type of the raw() method in ObjectReader to return the correct file-like | object instead of bytes. Improve docstrings and add type annotations for clarity.
  • "python/pytorch: follow-up" 74a8a5b74

Build; Lint; Continuous Integration (CI)

  • "github-CI: update actions/download-artifact to v4" e72fb7a93
  • "github-CI: follow-up; add support for python 3.8+, remove xattrs installation" 34e3b6070
  • "lint" 2e9edeec3 | * golangci 1.61.0 (was 1.60.2)
  • "CI: Allow changing runner tags via variables" 6759515d2
  • "github-CI: update upload-artifact v4 for pypi release" a6612f7ec
  • "gitlab-CI: only run python authn tests on build success" ace4bce22
  • "gitlab-CI: fix AuthN python tests not triggering automatically" d686c5991 | - add when: always to the rules for authn label and directory changes to ensure automatic job triggering. | - reorder rules to avoid every job defaulting to manual triggers.
  • "bump 'google-protobuf' to address CVE-2024-7254" eb369d4c8
  • "bump rexml to address CVE-2024-41946" 09a8b0377 | https://nvd.nist.gov/vuln/detail/CVE-2024-41946
  • "github-CI: remove unnecessary dirs, change space config" f3369232c
  • "gitlab-CI: fix" 8d6f47f8d
  • "tools: amend test skipping logic" 8f50f0439
  • "lint; up cli" 3500e9288 | * golangci 1.60.2 (was 1.60.1); linters: | - exportloopref (deprecated) | + copyloopvar linter
  • "build: bump rc3" 461a64bc1 | * v3.24.rc3
  • "CI: Do not always run python-authn tests" 1d41eced3
  • "CI: correct ishard long test directory" e36e7a576
  • "name-is-too-long, and similar cleanups (minor)" a9a9a6bdd | * checkmarx pass two
  • "checkmarx compliance, part 1" c751f263d
  • "build: upgrade OSS packages" a0a2b0006 | * aistore and cli, both
  • "CI: fix" eae7d4c9c
  • "github-CI: add authn tests" 080809de1
  • "CI: Python SDK AuthN Tests" cbc3fbc60
  • "lint" 9f51aed7a | * golangci 1.60.1 (was 1.59.1)
  • "CI: standardize github configuration" 875e0e911
  • "CI: add botocore dependencies for lint" f3733d3bf
  • "CI: update pylint" fa96f71f4
  • "CI: update and simplify Dockerfile" 268cfd131
  • "deploy: log pod errors upon minikube setup timeout" 0d437de0b | ensures any issues causing the timeout are captured and logged for easier debugging in github workflow
  • "scripts: add rancher lpp to gitlab runner setup script" f72fc7dd6
  • "scripts: add colour to ignore rules list for spell check" b28cbb5e8
  • "build: demote ht:// backend; revise local-playground scripts" 1a028279b | * loopback count and size | * default number of mountpaths = 4 | * docs/getting-started | * part three, prev. commit: aa9f4288658
  • "build: demote ht:// backend; revise local-playground scripts" aa9f42886 | * add build tag ht; link ht:// conditionally | * related commit: 50db672cb34f90
  • "build: bump rc2 cli" 69d2659e3
  • "build: bump rc2" abe0c157a | * v3.24.rc1
  • "build: bump rc1" 83f6c9372 | * v3.24.rc1
  • "CI: build AuthN image in docker workflow" eb9aa1dbe
  • "up cli" 44324256a | * compile with extended FSHC config | * support per-backend metrics ('get-cold' removed)
  • "CI: fix netflify" 843cd65f7
  • "dependabot fix: REXML denial of service vulnerability" d3e470deb | * fix for https://github.com/NVIDIA/aistore/security/dependabot/25
  • "fix: include utils.sh source in minikube deployment" 3c0a49a1e | - since commit 50db672cb, some utility functions used in aisnode_config.sh are missing during minikube deployment | - the fix ensures that utils.sh script is imported in both docker image build and minikube deployment.
  • "fix: use portable env var check in deploy script for compatibility with macOS zsh" 8e5ed0006
  • "authn: fix local deployment environment variable renaming" eae062e97 | - corrected environment variable in local deployment setup.
  • "deploy: standardize Makefile for container build" 316e344fb
  • "deploy: remove readiness script from aisnode container" 137e58791
  • "general: remove Terraform as deployment option" 72fee04ff
  • "scripts: fix clean_deploy panic and implement empty value checking" 8f7c3ee3a
  • "build: upgrade grpc" de58a6eb1 | * https://github.com/NVIDIA/aistore/security/dependabot/24
  • "CI: cleanup around linter config" 3c6a02484
  • "deploy: generalize and fix building aisnode image" 434a89c18
  • "build: aisnode Dockerfile follow-up" 7a0916fdd
  • "build: standardize builder stage in Dockerfile" 8670a3361
  • "build/CLI: upgrade OSS packages" 3e21e3eb4 | * part two, prev. commit: c331cf26dd0e6
  • "build: upgrade OSS packages" c331cf26d | * prev. commit: f2ee1c0c18726e21
  • "CI: add pytorch integration tests to CI" 65d02ae90
  • "build: upgrade OSS packages; update Go toolchain" f2ee1c0c1 | * aistore and cli, both | * go get -u ./... && go get [email protected] | * prev. related: ceb7159b82a71afa

Documentation and Tests

Technical Blog

  • "blog: Google Colab + AIStore" [fb3fe2b7f](https://github.com/NVIDIA/aistore/commit/[)
  • "docs: presigned s3 requests; edits" [164bf0ef9](https://github.com/NVIDIA/aistore/commit/[)
  • "docs: update Python SDK streaming object file example with retry" [d3a541ff7](https://github.com/NVIDIA/aistore/commit/[)
  • "docs: cli/advanced.md, environment-vars.md, https.md, authn.md, cli.md" [e40ec36d1](https://github.com/NVIDIA/aistore/commit/[) | * new content mostly around https | * v3.24 updates | * cross-references, etc. text works
  • "docs: update AIStore setup instructions for Google Colab with notebook link" fbb20626c
  • "docs: running AIStore in Google Colab" 53d4e43a3
  • "docs: update docker-single readme, add multi-disk example" 110ecb276
  • "docs: update overview, terminology" 7776b1dea
  • "tech blog site: aistore.nvidia.com" f7fa977cc | * s/aiatscale.org/aistore.nvidia.com/ | * Makefile follow-up
  • "docs: edits across the board" bef16fb38
  • "docs: aistorage/cluster-minimal readme" 1f909bf08
  • "docs: create pytorch docs and update make generate-docs" 34a00ba5b
  • "blog: initial sharding (ishard)" 35bca58f5
  • "docs: add metric names-and-types reference" 1923f3dc4 | * include both internal and externally visible names, descriptions, and labels
  • "docs: amend LRU and Space configuration" 771ff60b7
  • "docs: update authn documentation" 68d669f17
  • "docs: disable/enable cloud backend at runtime" b0e5db9eb | * part three, prev. commit: acdfb0398439c
  • "docs: add howto-virtual-directories" c73d275ed | * prev. commits: 592f63bfd814, b630043f7198
  • "docs: remove mentions of du and df commands" 59719f66e
  • "docs: clarify purpose and implementation of botocore patch testing" 5c3fa229c
  • "docs: getting-started '--cleanup' option" 8bcfa8269
  • "docs: main readme, bucket inventory" 69041a17a | * also, un-defer s3 put datapath (minor)
  • "local playground: environment vs STDIN; TAGS vs AIS_BACKEND_PROVIDERS" 50db672cb | * (usability)
  • "local playground with disks; up cli" 7b9faaf15 | * introduce AIS_LOCAL_PLAYGROUND env | * revert commit a53d3b0b8eb4b (ais/utils)
  • "local playground with disks; clarify" a53d3b0b8 | * skip or not to skip localhost | * clarify one possible PUT fail
  • "local-playground: warn extended attributes may not be supported" b108c4202
  • "tests: run python ETL tests with reworked k8s CI setup" cd5277768
  • "tests: a job we try to abort may have already finished" 7b0b0bca0
  • "tests: fix running/finished race" 884fba383
  • "tests: when killing/restoring nodes" 33fa512cb | * always try to wait for the original (prior to test running) node counts | * note: proxyURL is global but often used as a local var
  • "tests: fix mock cloud backend" 4cfb1f22a
  • "tests: implemented and refactored ishard long stress test" 00586467e
  • "tests: cleanup logs in minikube VM after running k8s tests" 888da5de8
  • "tests: add ETL tests for concurrent transformations and various object sizes" 727662f61
  • "test-skipping logic (minor)" 5bf9321ec | * fix 7d0d196723f2c5