Previous change logs can be found at CHANGELOG-3.0.
The minimum recommended etcd versions to run in production are 3.1.11+, 3.2.26+, and 3.3.11+.
v3.1.21 (2019-TBD)
- Strip out insecure endpoints from DNS SRV records when using discovery with etcdctl v2
- Add
etcdctl endpoint health --write-out
support.- Previously,
etcdctl endpoint health --write-out json
did not work. - The command output is changed. Previously, if endpoint is unreachable, the command output is "<endpoint> is unhealthy: failed to connect: <error message>". This change unified the error message, all error types now have the same output "<endpoint> is unhealthy: failed to commit proposal: <error message>".
- Previously,
See List of metrics for all metrics per release.
Note that any etcd_debugging_*
metrics are experimental and subject to change.
v3.1.20 (2018-10-10)
See code changes and v3.1 upgrade guide for any breaking changes.
Again, before running upgrades from any previous release, please make sure to read change logs below and v3.1 upgrade guide.
- Improve "became inactive" warning log, which indicates message send to a peer failed.
- Improve read index wait timeout warning log, which indicates that local node might have slow network.
- Add gRPC interceptor for debugging logs; enable
etcd --debug
flag to see per-request debug information. - Add consistency check in snapshot status. If consistency check on snapshot file fails,
snapshot status
returns"snapshot file integrity check failed..."
error.
See List of metrics for all metrics per release.
Note that any etcd_debugging_*
metrics are experimental and subject to change.
- Improve
etcd_network_peer_round_trip_time_seconds
Prometheus metric to track leader heartbeats.- Previously, it only samples the TCP connection for snapshot messages.
- Display all registered gRPC metrics at start.
- Add
etcd_snap_db_fsync_duration_seconds_count
Prometheus metric. - Add
etcd_snap_db_save_total_duration_seconds_bucket
Prometheus metric. - Add
etcd_network_snapshot_send_success
Prometheus metric. - Add
etcd_network_snapshot_send_failures
Prometheus metric. - Add
etcd_network_snapshot_send_total_duration_seconds
Prometheus metric. - Add
etcd_network_snapshot_receive_success
Prometheus metric. - Add
etcd_network_snapshot_receive_failures
Prometheus metric. - Add
etcd_network_snapshot_receive_total_duration_seconds
Prometheus metric. - Add
etcd_server_id
Prometheus metric. - Add
etcd_server_health_success
Prometheus metric. - Add
etcd_server_health_failures
Prometheus metric. - Add
etcd_server_read_indexes_failed_total
Prometheus metric.
- Fix logic on release lock key if cancelled in
clientv3/concurrency
package.
- Compile with Go 1.8.7.
v3.1.19 (2018-07-24)
See code changes and v3.1 upgrade guide for any breaking changes.
Again, before running upgrades from any previous release, please make sure to read change logs below and v3.1 upgrade guide.
See List of metrics for all metrics per release.
Note that any etcd_debugging_*
metrics are experimental and subject to change.
- Add
etcd_server_go_version
Prometheus metric. - Add
etcd_server_slow_read_indexes_total
Prometheus metric. - Add
etcd_server_quota_backend_bytes
Prometheus metric.- Use it with
etcd_mvcc_db_total_size_in_bytes
andetcd_mvcc_db_total_size_in_use_in_bytes
. etcd_server_quota_backend_bytes 2.147483648e+09
means current quota size is 2 GB.etcd_mvcc_db_total_size_in_bytes 20480
means current physically allocated DB size is 20 KB.etcd_mvcc_db_total_size_in_use_in_bytes 16384
means future DB size if defragment operation is complete.etcd_mvcc_db_total_size_in_bytes - etcd_mvcc_db_total_size_in_use_in_bytes
is the number of bytes that can be saved on disk with defragment operation.
- Use it with
- Add
etcd_mvcc_db_total_size_in_bytes
Prometheus metric.- In addition to
etcd_debugging_mvcc_db_total_size_in_bytes
.
- In addition to
- Add
etcd_mvcc_db_total_size_in_use_in_bytes
Prometheus metric.- Use it with
etcd_mvcc_db_total_size_in_bytes
andetcd_mvcc_db_total_size_in_use_in_bytes
. etcd_server_quota_backend_bytes 2.147483648e+09
means current quota size is 2 GB.etcd_mvcc_db_total_size_in_bytes 20480
means current physically allocated DB size is 20 KB.etcd_mvcc_db_total_size_in_use_in_bytes 16384
means future DB size if defragment operation is complete.etcd_mvcc_db_total_size_in_bytes - etcd_mvcc_db_total_size_in_use_in_bytes
is the number of bytes that can be saved on disk with defragment operation.
- Use it with
- Fix lease keepalive interval updates when response queue is full.
- If
<-chan *clientv3LeaseKeepAliveResponse
fromclientv3.Lease.KeepAlive
was never consumed or channel is full, client was sending keepalive request every 500ms instead of expected rate of every "TTL / 3" duration.
- If
- Compile with Go 1.8.7.
v3.1.18 (2018-06-15)
See code changes and v3.1 upgrade guide for any breaking changes.
Again, before running upgrades from any previous release, please make sure to read change logs below and v3.1 upgrade guide.
See List of metrics for all metrics per release.
Note that any etcd_debugging_*
metrics are experimental and subject to change.
- Add
etcd_server_version
Prometheus metric.- To replace Kubernetes
etcd-version-monitor
.
- To replace Kubernetes
- Compile with Go 1.8.7.
v3.1.17 (2018-06-06)
See code changes and v3.1 upgrade guide for any breaking changes.
Again, before running upgrades from any previous release, please make sure to read change logs below and v3.1 upgrade guide.
- Fix v3 snapshot recovery.
- A follower receives a leader snapshot to be persisted as a
[SNAPSHOT-INDEX].snap.db
file on disk. - Now, server ensures that the incoming snapshot be persisted on disk before loading it.
- Otherwise, index mismatch happens and triggers server-side panic (e.g. newer WAL entry with outdated snapshot index).
- A follower receives a leader snapshot to be persisted as a
- Compile with Go 1.8.7.
v3.1.16 (2018-05-31)
See code changes and v3.1 upgrade guide for any breaking changes.
Again, before running upgrades from any previous release, please make sure to read change logs below and v3.1 upgrade guide.
- Fix
mvcc
server panic from restore operation.- Let's assume that a watcher had been requested with a future revision X and sent to node A that became network-partitioned thereafter. Meanwhile, cluster makes progress. Then when the partition gets removed, the leader sends a snapshot to node A. Previously if the snapshot's latest revision is still lower than the watch revision X, etcd server panicked during snapshot restore operation.
- Now, this server-side panic has been fixed.
- Compile with Go 1.8.7.
v3.1.15 (2018-05-09)
See code changes and v3.1 upgrade guide for any breaking changes.
Again, before running upgrades from any previous release, please make sure to read change logs below and v3.1 upgrade guide.
- Purge old
*.snap.db
snapshot files.- Previously, etcd did not respect
--max-snapshots
flag to purge old*.snap.db
files. - Now, etcd purges old
*.snap.db
files to keep maximum--max-snapshots
number of files on disk.
- Previously, etcd did not respect
- Compile with Go 1.8.7.
v3.1.14 (2018-04-24)
See code changes and v3.1 upgrade guide for any breaking changes.
Again, before running upgrades from any previous release, please make sure to read change logs below and v3.1 upgrade guide.
See List of metrics for all metrics per release.
Note that any etcd_debugging_*
metrics are experimental and subject to change.
- Add
etcd_server_is_leader
Prometheus metric.
- Add
--initial-election-tick-advance
flag to configure initial election tick fast-forward.- By default,
--initial-election-tick-advance=true
, then local member fast-forwards election ticks to speed up "initial" leader election trigger. - This benefits the case of larger election ticks. For instance, cross datacenter deployment may require longer election timeout of 10-second. If true, local node does not need wait up to 10-second. Instead, forwards its election ticks to 8-second, and have only 2-second left before leader election.
- Major assumptions are that: cluster has no active leader thus advancing ticks enables faster leader election. Or cluster already has an established leader, and rejoining follower is likely to receive heartbeats from the leader after tick advance and before election timeout.
- However, when network from leader to rejoining follower is congested, and the follower does not receive leader heartbeat within left election ticks, disruptive election has to happen thus affecting cluster availabilities.
- Now, this can be disabled by setting
--initial-election-tick-advance=false
. - Disabling this would slow down initial bootstrap process for cross datacenter deployments. Make tradeoffs by configuring
--initial-election-tick-advance
at the cost of slow initial bootstrap. - If single-node, it advances ticks regardless.
- Address disruptive rejoining follower node.
- By default,
- Compile with Go 1.8.7.
v3.1.13 (2018-03-29)
See code changes and v3.1 upgrade guide for any breaking changes.
Again, before running upgrades from any previous release, please make sure to read change logs below and v3.1 upgrade guide.
- Adjust election timeout on server restart to reduce disruptive rejoining servers.
- Previously, etcd fast-forwards election ticks on server start, with only one tick left for leader election. This is to speed up start phase, without having to wait until all election ticks elapse. Advancing election ticks is useful for cross datacenter deployments with larger election timeouts. However, it was affecting cluster availability if the last tick elapses before leader contacts the restarted node.
- Now, when etcd restarts, it adjusts election ticks with more than one tick left, thus more time for leader to prevent disruptive restart.
See List of metrics for all metrics per release.
Note that any etcd_debugging_*
metrics are experimental and subject to change.
- Add missing
etcd_network_peer_sent_failures_total
count.
- Compile with Go 1.8.7.
v3.1.12 (2018-03-08)
See code changes and v3.1 upgrade guide for any breaking changes.
Again, before running upgrades from any previous release, please make sure to read change logs below and v3.1 upgrade guide.
- Fix
mvcc
"unsynced" watcher restore operation.- "unsynced" watcher is watcher that needs to be in sync with events that have happened.
- That is, "unsynced" watcher is the slow watcher that was requested on old revision.
- "unsynced" watcher restore operation was not correctly populating its underlying watcher group.
- Which possibly causes missing events from "unsynced" watchers.
- A node gets network partitioned with a watcher on a future revision, and falls behind receiving a leader snapshot after partition gets removed. When applying this snapshot, etcd watch storage moves current synced watchers to unsynced since sync watchers might have become stale during network partition. And reset synced watcher group to restart watcher routines. Previously, there was a bug when moving from synced watcher group to unsynced, thus client would miss events when the watcher was requested to the network-partitioned node.
- Compile with Go 1.8.7.
v3.1.11 (2017-11-28)
See code changes and v3.1 upgrade guide for any breaking changes.
Again, before running upgrades from any previous release, please make sure to read change logs below and v3.1 upgrade guide.
- #8411,#8806 backport "mvcc: sending events after restore"
- #8009,#8902 backport coreos/bbolt v1.3.1-coreos.5
- Compile with Go 1.8.5.
v3.1.10 (2017-07-14)
See code changes and v3.1 upgrade guide for any breaking changes.
Again, before running upgrades from any previous release, please make sure to read change logs below and v3.1 upgrade guide.
- Tag docker images with minor versions.
- e.g.
docker pull quay.io/coreos/etcd:v3.1
to fetch latest v3.1 versions.
- e.g.
- Compile with Go 1.8.3.
- Fix panic on
net/http.CloseNotify
- Fix panic on
v3.1.9 (2017-06-09)
See code changes and v3.1 upgrade guide for any breaking changes.
Again, before running upgrades from any previous release, please make sure to read change logs below and v3.1 upgrade guide.
- Allow v2 snapshot over 512MB.
- Compile with Go 1.7.6.
v3.1.8 (2017-05-19)
See code changes and v3.1 upgrade guide for any breaking changes.
Again, before running upgrades from any previous release, please make sure to read change logs below and v3.1 upgrade guide.
- Compile with Go 1.7.5.
v3.1.7 (2017-04-28)
See code changes and v3.1 upgrade guide for any breaking changes.
Again, before running upgrades from any previous release, please make sure to read change logs below and v3.1 upgrade guide.
- Compile with Go 1.7.5.
v3.1.6 (2017-04-19)
See code changes and v3.1 upgrade guide for any breaking changes.
Again, before running upgrades from any previous release, please make sure to read change logs below and v3.1 upgrade guide.
- Fill in Auth API response header.
- Remove auth check in Status API.
- Compile with Go 1.7.5.
v3.1.5 (2017-03-27)
See code changes and v3.1 upgrade guide for any breaking changes.
Again, before running upgrades from any previous release, please make sure to read change logs below and v3.1 upgrade guide.
- Fix raft memory leak issue.
- Fix Windows file path issues.
- Add
/etc/nsswitch.conf
file to alpine-based Docker image.
- Compile with Go 1.7.5.
v3.1.4 (2017-03-22)
See code changes and v3.1 upgrade guide for any breaking changes.
Again, before running upgrades from any previous release, please make sure to read change logs below and v3.1 upgrade guide.
- Compile with Go 1.7.5.
v3.1.3 (2017-03-10)
See code changes and v3.1 upgrade guide for any breaking changes.
Again, before running upgrades from any previous release, please make sure to read change logs below and v3.1 upgrade guide.
- Fix
etcd gateway
schema handling in DNS discovery. - Fix sd_notify behaviors in
gateway
,grpc-proxy
.
- Fix sd_notify behaviors in
gateway
,grpc-proxy
.
- Use machine default host when advertise URLs are default values(
localhost:2379,2380
) AND if listen URL is0.0.0.0
.
- Compile with Go 1.7.5.
v3.1.2 (2017-02-24)
See code changes and v3.1 upgrade guide for any breaking changes.
Again, before running upgrades from any previous release, please make sure to read change logs below and v3.1 upgrade guide.
- Fix
etcd gateway
with multiple endpoints.
- Use IPv4 default host, by default (when IPv4 and IPv6 are available).
- Compile with Go 1.7.5.
v3.1.1 (2017-02-17)
See code changes and v3.1 upgrade guide for any breaking changes.
Again, before running upgrades from any previous release, please make sure to read change logs below and v3.1 upgrade guide.
- Compile with Go 1.7.5.
v3.1.0 (2017-01-20)
See code changes and v3.1 upgrade guide for any breaking changes.
Again, before running upgrades from any previous release, please make sure to read change logs below and v3.1 upgrade guide.
- Faster linearizable reads (implements Raft read-index).
- v3 authentication API is now stable.
- Deprecated following gRPC metrics in favor of go-grpc-prometheus.
etcd_grpc_requests_total
etcd_grpc_requests_failed_total
etcd_grpc_active_streams
etcd_grpc_unary_requests_duration_seconds
- Upgrade
github.com/ugorji/go/codec
tougorji/go@9c7f9b7
, and regenerate v2client
.
See security doc for more details.
- SRV records (e.g., infra1.example.com) must match the discovery domain (i.e., example.com) if no custom certificate authority is given.
TLSConfig.ServerName
is ignored with user-provided certificates for backwards compatibility; to be deprecated.- For example,
etcd --discovery-srv=example.com
will only authenticate peers/clients when the provided certs have root domainexample.com
as an entry in Subject Alternative Name (SAN) field.
- Automatic leadership transfer when leader steps down.
- etcd flags
--strict-reconfig-check
flag is set by default.- Add
--log-output
flag. - Add
--metrics
flag.
- etcd uses default route IP if advertise URL is not given.
- Cluster rejects removing members if quorum will be lost.
- Discovery now has upper limit for waiting on retries.
- Warn on binding listeners through domain names; to be deprecated.
- v3.0 and v3.1 with
--auto-compaction-retention=10
run periodic compaction on v3 key-value store for every 10-hour.- Compactor only supports periodic compaction.
- Compactor records latest revisions every 5-minute, until it reaches the first compaction period (e.g. 10-hour).
- In order to retain key-value history of last compaction period, it uses the last revision that was fetched before compaction period, from the revision records that were collected every 5-minute.
- When
--auto-compaction-retention=10
, compactor uses revision 100 for compact revision where revision 100 is the latest revision fetched from 10 hours ago. - If compaction succeeds or requested revision has already been compacted, it resets period timer and starts over with new historical revision records (e.g. restart revision collect and compact for the next 10-hour period).
- If compaction fails, it retries in 5 minutes.
- Add
SetEndpoints
method; update endpoints at runtime. - Add
Sync
method; auto-update endpoints at runtime. - Add
Lease TimeToLive
API; fetch lease information. - replace Config.Logger field with global logger.
- Get API responses are sorted in ascending order by default.
- Add
lease timetolive
command. - Add
--print-value-only
flag to get command. - Add
--dest-prefix
flag to make-mirror command. get
command responses are sorted in ascending order by default.
- Experimental gRPC proxy feature.
recipes
now conform to sessions defined inclientv3/concurrency
.- ACI has symlinks to
/usr/local/bin/etcd*
.
- Compile with Go 1.7.4.