Releases: owtech/foundationdb
Releases · owtech/foundationdb
v7.1.33-1.ow
Merged 7.1.33 from upstream to ow-fork (#73)
v7.2.7-1.ow
Ow fork 7.2 7 (#72)
v7.3.0-1.ow
Reduced number of parallel threads of build from 3 to 2. (#71) * Attemt to build with parallel 4 * Build with parallel 2 --------- Co-authored-by: Oleg Samarin <[email protected]>
v7.1.31-1.ow
Merged upstream 7.1.31 (#68) * Fix transaction_too_old error when version vector is enabled When VV is enabled, the comparison of storage server version and read version should use the original read version, otherwise, the client may get the wrong transaction_too_old error. * Fix assertions w.r.t. VV * Avoid using oldest version as read version for VV * Disable a debugging trace event * Cherry pick 8630 * Address review comments * enable AVX and update version for 7.1.25 release * skip proxy when fetching kubectl * add generated.go * update version after 7.1.25 release * Add changes to generated.go from PR8761, and remove change to ConfigureCompiler.cmake * Update generated.go * Update generated.go with 8761 * Rocksdb stats level knob. (#8713) * Adding counters for singlekey clear requests (#8792) * add bytelimit for prefetch This is a patch to release-7.1 after resolving conflicts from commit in main branch, in order to enable byteLimit in release-7.1 A fraction of byteLimit will be used as the limit to fetch index. For the indexes fetched, fetch records for them in batch. byteLimit always count the index size, it also count record if exist, it at least return 1 index-record entry and always include the last entry despite that adding the last entry despite it might exceed limit. There is a Knob STRICTLY_ENFORCE_BYTE_LIMIT, when it is set, records will be discarded once the byteLimit is hit, despite they are fetched. Otherwise, return the whole batch. * debug seg fault * Revert "debug seg fault" This reverts commit fadcb0820c8a5901bbefe70825bd7a77ebc93081. * [release-7.1] Add SS read range bytes metrics. (#8697) (#8724) * Add SS read range bytes metrics. (#8697) * Fix build failure * clang-fmt * fmt * Rocksdb suggest compact range checks * RocksDB 7.7.3 version upgrade * Fix backup worker assertion failure The number of released bytes exceeds the number of acquired bytes in locks. This is because the bytes counted towards release is calculated after a "wait", when more bytes could be allocated. * Increase buggified lock bytes for backup workers To fix simulation failures where the knob value is too small. * Send error when LogRouterPeekPopped happens Otherwise, the remote tlog won't get a response and the parallel peek requests will never be cleared, blocking subsequent peeks. As a result, remote tlog will no longer be able to pop the log router, which in turn can no longer peek tlogs. The whole remote side will become blocked. * Add more debug events * Add DebugTrace.h to 7.1 branch Cherry-pick PR#8856 requires DebugTrace.h due to the use of DebugLogTraceEvent function * Fix the bug of variable int32 overflow. 1.the content length from http response transformed using 'atoi' would rise int32 overflow. 2.the offset's aligning would rise int32 overflow. * Fix -Wformat warning * Add determinstic in gray failure degraded server selection * format source code after switch to clang 15 * Fix clang 15 compiling errors * Fix gcc 11 compiling errors * Fix more warnings * Moving rocksdb read iterator destruction from commit path to actor. (#8971) * Release 7.1: Cherry pick pull request #9033 (#9037) * Merge pull request #9033 from sbodagala/main * - Code formatting Co-authored-by: Jingyu Zhou <[email protected]> * Fix:Exclusion stuck because DD cannot build new teams Bug behavior: When DD has zero healthy machine teams but more unhealthy machine teams than the max machine teams DD plans to build, DD will stop building new machine teams. Due to zero healthy machine team (and zero healthy server team), DD cannot find a healthy destination team to relocate data. When data relocation stops, exclusion stops progressing and stuck. Bug happens when we *shrink* a k-host cluster by first adding k/2 new host; then quickly excluding all old hosts. Fix: Let DD build temporary extra teams to relocate data. The extra teams will be cleaned up later by DD's remove extra teams logic. Simulation test: There is no simulation test to cover cluster expansion scnenario. To most closely simulate this behavior, we intentionally overbuild all possible machine teams to trigger the condition that unhealthy teams is larger than the maximum teams DD wants to build later. * Resolve review comment: No functional change * Add back samples for (non)empty peeks stats [release-7.1] (#9074) * Add back samples for (non)empty peeks stats These were lost, likely due to refactoring. Now TLogMetrics have meaningful data like: TLogMetrics ID=59ec9c67b4d07433 Elapsed=5 BytesInput=0 -1 17048 BytesDurable=47.4 225.405 17048 BlockingPeeks=0 -1 0 BlockingPeekTimeouts=0 -1 0 EmptyPeeks=1.6 2.79237 236 NonEmptyPeeks=0 -1 32 ... * Use LATENCY_SAMPLE_SIZE * fix health monitor last logged time * Backport RocksDB cmake file to 7.1 (#9093) * Fix the RocksDB compile issue with clang By default, RocksDB is using its own compile/link flags, no matter how FDB flags are. This led to the issue that if FDB decides to use clang/ldd/libc++, RocksDB will pick up the compiler/linker but still use libstdc++, which is incompatible to libc++, causing Symobl Missing error during the link stage. With this patch, if FDB uses libc++, then the information is stored in CMAKE_CXX_FLAGS and being forwarded to RocksDB. RocksDB will then use libc++ and compatible with FDB. * fixup! Fix the clang error in bindings/c * add some rocksdb compile options that can be passed in at build time * Disconnection to satellite TLog should trigger recovery in gray failure detection * Upgrade sphinx and document test harness and code probes * Apply suggestions from code review Co-authored-by: Trevor Clinkenbeard <[email protected]> Co-authored-by: Bharadwaj V.R <[email protected]> * clarify how code probes are reported * clarify statistics of TestHarness * Bump setuptools from 65.3.0 to 65.5.1 in /documentation/sphinx Bumps [setuptools](https://github.com/pypa/setuptools) from 65.3.0 to 65.5.1. - [Release notes](https://github.com/pypa/setuptools/releases) - [Changelog](https://github.com/pypa/setuptools/blob/main/CHANGES.rst) - [Commits](https://github.com/pypa/setuptools/compare/v65.3.0...v65.5.1) --- updated-dependencies: - dependency-name: setuptools dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> * Add event for txn server initialization and a warning for TLog slow catching up * Change TLog pull async data warning timeout * Adding rocksDB control compaction on deletion knobs. (#9165) * Add 7.1.26, 7.1.27 release notes (#9186) * Added metrics for read range operations. * Log PingLatency when there is no ping latency samples, but ping attempts * Changing histogram type. (#9227) * Release 7.1: Cherry pick pull request #9225 (#9252) * - Do not add fdbserver processes to the client list. (#9225) Note: Server processes started getting reported as clients since 7.1.0 (not sure if this change in behavior was intentional or not), and this breaks the operator upgrade logic. * - Address a compilation error * - Update release-notes. * - Address a review comment/CI failure. * - Address CI, related to release notes, failure. * disable AVX for 7.1.26 release * enable AVX and update version for 7.1.27 release * update version after 7.1.27 release * Increase buggified lock bytes for backup workers to at least 256 MB. We are still encountered simulation failures where the backup worker is waiting on the lock and an assertion fails. * Reduce logging level for verbose events From one of nightly failure due to too many log lines, these are top 3: 60100 FastRestoreLoaderDispatchRequests 79655 FastRestoreGetVersionSize 93888 FastRestoreSplitMutation * Fix typo in fdb.options * update bindings/go/src/fdb/generated.go * Fix getMappedRange metrics(release-7.1) (#9331) * Fix getMappedRange metrics Metrics related to getMappedRange API are counted twice, having a set of new metrics specifically for getMappedRange solves the issue. * Fix clang init order issue * Enable rocksdb in simulation in 7.1. Exclude FuzzApi and HighContention tests temporarily for rocksdb. (#9374) * Fix IDE build and warnings * Rocksdb knob changes. (#9393) * Fix compiler warnings * Add exclude to fdbcli's configure command Right now this only allows one server address being excluded. This is useful when the database is unavailable but we want the recruitment to skip some particular processes. Manually tested the concept works with a loopback cluster. * Allow a comma separated list of excluded addresses * Add ClogTlog workload * Update clogTlog workload to be single region * Exclude failed tlog if recovery stuck more than 30s Because the tlog is clogged, recovery can stuck in initializing_transaction_servers. This exclude allows the recovery to complete. * Change to only clog once for a particular tlog If we repeat clogging, different tlogs may be excluded, which can cause the recovery to stuck. * Move ClogTlog.toml to rare * Fix rare test failures Unclog after DB is recovered, otherwise another recovery may become stuck again. * Address review comments * Allow fdbdecode to read filters from a file * Fix filter delimiter and print sub versions * Use KeyRangeMap for better matching performance * fdbdecode: read backup range files * add filtering * Allow fdbdecode to read filters from a file * Fix filter delimiter and print sub versions * Use KeyRangeMap for better matching performance * Disable filter validate by default * Use RangeMap for backup agent filtering This is more efficient than going through ranges one by one. * Refactor code * Allow fdbbackup, fdbrestore to read keyranges from a file * Use the RangeMapFilters * add command line option * Clang-format * Fix -t flag bug for fdbdecode (#9489) * Fix fdbbackup query returning earliest version * Query backup size from a specific snapshot * clean format * Explicitly using min and max restorable version from backup description in query command in stead of going throw snapshots * fix clang build error * Add more comments in fdbbackup query command, and address comments * Change PTreeImpl::insert to overwrite existing entries (#9138) * Change PTreeImpl::insert to overwrite existing entries Maintaining partial persistence of course. We can theoretically also avoid creating a new node if the insert version of the node comparing equal to `x` is the latestVersion. There isn't a generic way to tell from the ptree though since insertAt is a concept that only exists within VersionedMap. Either way, avoiding the `contains` call and the tree rotations is already a big improvement. The old node should only be reachable from old roots, and so it should get cleaned up as part of forgetVersions in the storage server. * Update fdbclient/include/fdbclient/VersionedMap.h * Avoid repeated search in VersionedMap::erase(iterator) (#9143) * Use KeyspaceSnapshotFile to filter range files * Change mutation and KV logging to SevInfo Set max length as well to avoid TraceEventOverflow. * Output in HEX format for easy regex matching * Refactor decoder to read file as a whole once To reduce the number of network requests. * Add more trace events * Allow log router to detect slow peeks and to switch DC for peeking [release-7.1] (#9640) * Add DcLag tests and workload * Add disableSimSpeedup to clog network longer * Ignore the DcLag test * Refactor LogRouter's pullAsyncData * Switch DC if log router peek becomes stuck Trying to a different DC if this happens. * Enable DcLag test * Require at least 2 regions and having satellites * Simplify DcLag code * Limit connection failures to be within tests In particular, disable connection failures when initializing the database during the startup phase, i.e., before running with test specs. * Revert disableSimSpeedup * Fix conflicts after cherrypick * More fixes after cherrypick * Refactor to address comments * Use a constant for connectionFailuresDisableDuration * Fix ClogTlog workload valgrind error * Address comments * Reduce running time for DcLag The switch can happen quicker than the workload detection time, so need to adjust detection time lower than LOG_ROUTER_PEEK_SWITCH_DC_TIME. * Fix issue where the versions on seed storage servers decreased Seed storage servers are recruited as the intial set of storage servers when a database is first created. They function a little bit differently than normal, and do not set an initial version like storages normally do when they get recruited (typically equal to the recovery version). Version correction is a feature where versions advance in sync with the clock, and are equal across FDB clusters. To allow different FDB clusters to have matching versions, they must share the same base version. This defaults to the Unix epoch, and clusters with the version epoch enabled will have a current version equal to the number of microseconds since the Unix epoch. When the version epoch is enabled on a cluster, it causes a one time jump from the clusters current version to the version based on the epoch. After a recovery, the recovery version sent to storages should have advanced by a significant amount. The recovery path contained a `BUGGIFY` to randomly advance the recovery version in simulation, testing the version epoch being enabled. However, it was also advancing the version during an initial recovery, when the seed storage servers are recruited. If a set of storage servers were recruited as seed servers, but another recovery occurred before the bootstrap process was complete, the randomly selected version increase could be smaller during the second recovery than during the first. This could cause the initial set of seed servers to think they should be at a version larger than what the cluuster was actually at. The fix contained in this commit is to only cause a random version jump when the recovery is occuring on an existing database, and not when it is recruiting seed storages. This commit fixes an issue found in simulation, reproducible with: Commit: 93dc4bfeb97a700bafa4b34bc18d38a248e47b35 Test: fast/DataLossRecovery.toml Seed: 3101495991 Buggify: on Compiler: clang * Added 7.1.28 and 7.1.29 release notes * Reduce running time for ClogTlog When the ClogTlog is running, we may already pass the 450s, i.e., SIM_SPEEDUP_AFTER_SECONDS, and clogging is no longer effective. If that's the case, we want to finish the test quickly. * Remove profile code from SpecialKeySpace workload This part of code has problems with GlobalConfig and is buggy. * disable AVX for 7.1.28 release * enable AVX and update version for 7.1.29 release * update version after 7.1.29 release * Update info trigger new DB info update immediately * Backport exclusion fix #9468 (#9789) * Don't block the exclusion of stateless processes by the free capacity check * Fix syntax * Make use of precomputed exclude check * Format code * Only consider newly excluded processes * Format code and update comment * Fix finishedQueries metric, add metrics reporting in GetMappedRange test [release-7.1] (#9785) * Fix finishedQueries metric, add metrics reporting in GetMappedRange test * refactor to make format work * resolve comments * Fix more comments * Fix bugs and change running time of test * Adding rocksdb bloom filter knobs. (#9770) * [Release 7.1] Do not update exclude/failed system metadata in excludeServers if the input list is already excluded/failed (#9809) * Add a check in excludeServer function that if the exclusion list already exists, don't need to issue new writes. * Update documentation * Parameterized queue length in GetMappedRange test (#9808) Also retry when operation_cancelled happens * Add 7.1.30, 7.1.31 release notes (#9822) * Don't stop iterating over all storage processes in exclusion check (#9869) * checkSafeExclusion should always create new ExclusionSafetyCheckRequest (#9871) * RocksDB 7.10.2 version upgrade (#9829) * Changing single key deletions to delete based on number of deletes instead of bytelimit. * Implement check if locality is already excluded in exclude locality command (#9878) * Merge pull request #9814 from sbodagala/main (#9883) FdbServer not able to join cluster Co-authored-by: Jingyu Zhou <[email protected]> * Update 7.1.30 release notes * Remove printable() from TSS trace events * Fix releast notes * Fixed stuck data movement when a server is removed [release-7.1] (#9904) * Fixed stuck data movement when a server is removed When a server is removed, dataDistributionRelocator doesn't remove the work for the destination storage workers. As a result, it can no long move shard into any of the healthy workers in the destination team. * Avoid double complete the work * disable AVX for 7.1.30 release * enable AVX and update version for 7.1.31 release --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Jingyu Zhou <[email protected]> Co-authored-by: Dan Lambright <[email protected]> Co-authored-by: FoundationDB CI <[email protected]> Co-authored-by: neethuhaneesha <[email protected]> Co-authored-by: Jingyu Zhou <[email protected]> Co-authored-by: Hao Fu <[email protected]> Co-authored-by: hao fu <[email protected]> Co-authored-by: Yao Xiao <[email protected]> Co-authored-by: Meng Xu <[email protected]> Co-authored-by: Huiyoung <[email protected]> Co-authored-by: sfc-gh-tclinkenbeard <[email protected]> Co-authored-by: Zhe Wu <[email protected]> Co-authored-by: Sreenath Bodagala <[email protected]> Co-authored-by: Meng Xu <[email protected]> Co-authored-by: Xiaoge Su <[email protected]> Co-authored-by: Aaron Molitor <[email protected]> Co-authored-by: Markus Pilman <[email protected]> Co-authored-by: Bharadwaj V.R <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Dan Adkins <[email protected]> Co-authored-by: Vishesh Yadav <[email protected]> Co-authored-by: Andrew Noyes <[email protected]> Co-authored-by: Lukas Joswiak <[email protected]> Co-authored-by: Johannes Scheuermann <[email protected]> Co-authored-by: Oleg Samarin <[email protected]>
v7.2.5-1.ow
Added checking for the github version correctness (#64) * Switch to a new build image * Build image on tag * Added checking for the github version correctness (cherry picked from commit b9769f13b380dc56757a2d7648f6b4e2843aacdd) --------- Co-authored-by: Oleg Samarin <[email protected]>
v7.1.29-1.ow
Merged upstream 7.1.29 (#60) * Fix transaction_too_old error when version vector is enabled When VV is enabled, the comparison of storage server version and read version should use the original read version, otherwise, the client may get the wrong transaction_too_old error. * Fix assertions w.r.t. VV * Avoid using oldest version as read version for VV * Disable a debugging trace event * Cherry pick 8630 * Address review comments * enable AVX and update version for 7.1.25 release * skip proxy when fetching kubectl * add generated.go * update version after 7.1.25 release * Add changes to generated.go from PR8761, and remove change to ConfigureCompiler.cmake * Update generated.go * Update generated.go with 8761 * Rocksdb stats level knob. (#8713) * Adding counters for singlekey clear requests (#8792) * add bytelimit for prefetch This is a patch to release-7.1 after resolving conflicts from commit in main branch, in order to enable byteLimit in release-7.1 A fraction of byteLimit will be used as the limit to fetch index. For the indexes fetched, fetch records for them in batch. byteLimit always count the index size, it also count record if exist, it at least return 1 index-record entry and always include the last entry despite that adding the last entry despite it might exceed limit. There is a Knob STRICTLY_ENFORCE_BYTE_LIMIT, when it is set, records will be discarded once the byteLimit is hit, despite they are fetched. Otherwise, return the whole batch. * debug seg fault * Revert "debug seg fault" This reverts commit fadcb0820c8a5901bbefe70825bd7a77ebc93081. * [release-7.1] Add SS read range bytes metrics. (#8697) (#8724) * Add SS read range bytes metrics. (#8697) * Fix build failure * clang-fmt * fmt * Rocksdb suggest compact range checks * RocksDB 7.7.3 version upgrade * Fix backup worker assertion failure The number of released bytes exceeds the number of acquired bytes in locks. This is because the bytes counted towards release is calculated after a "wait", when more bytes could be allocated. * Increase buggified lock bytes for backup workers To fix simulation failures where the knob value is too small. * Send error when LogRouterPeekPopped happens Otherwise, the remote tlog won't get a response and the parallel peek requests will never be cleared, blocking subsequent peeks. As a result, remote tlog will no longer be able to pop the log router, which in turn can no longer peek tlogs. The whole remote side will become blocked. * Add more debug events * Add DebugTrace.h to 7.1 branch Cherry-pick PR#8856 requires DebugTrace.h due to the use of DebugLogTraceEvent function * Fix the bug of variable int32 overflow. 1.the content length from http response transformed using 'atoi' would rise int32 overflow. 2.the offset's aligning would rise int32 overflow. * Fix -Wformat warning * Add determinstic in gray failure degraded server selection * format source code after switch to clang 15 * Fix clang 15 compiling errors * Fix gcc 11 compiling errors * Fix more warnings * Moving rocksdb read iterator destruction from commit path to actor. (#8971) * Release 7.1: Cherry pick pull request #9033 (#9037) * Merge pull request #9033 from sbodagala/main * - Code formatting Co-authored-by: Jingyu Zhou <[email protected]> * Fix:Exclusion stuck because DD cannot build new teams Bug behavior: When DD has zero healthy machine teams but more unhealthy machine teams than the max machine teams DD plans to build, DD will stop building new machine teams. Due to zero healthy machine team (and zero healthy server team), DD cannot find a healthy destination team to relocate data. When data relocation stops, exclusion stops progressing and stuck. Bug happens when we *shrink* a k-host cluster by first adding k/2 new host; then quickly excluding all old hosts. Fix: Let DD build temporary extra teams to relocate data. The extra teams will be cleaned up later by DD's remove extra teams logic. Simulation test: There is no simulation test to cover cluster expansion scnenario. To most closely simulate this behavior, we intentionally overbuild all possible machine teams to trigger the condition that unhealthy teams is larger than the maximum teams DD wants to build later. * Resolve review comment: No functional change * Add back samples for (non)empty peeks stats [release-7.1] (#9074) * Add back samples for (non)empty peeks stats These were lost, likely due to refactoring. Now TLogMetrics have meaningful data like: TLogMetrics ID=59ec9c67b4d07433 Elapsed=5 BytesInput=0 -1 17048 BytesDurable=47.4 225.405 17048 BlockingPeeks=0 -1 0 BlockingPeekTimeouts=0 -1 0 EmptyPeeks=1.6 2.79237 236 NonEmptyPeeks=0 -1 32 ... * Use LATENCY_SAMPLE_SIZE * fix health monitor last logged time * Backport RocksDB cmake file to 7.1 (#9093) * Fix the RocksDB compile issue with clang By default, RocksDB is using its own compile/link flags, no matter how FDB flags are. This led to the issue that if FDB decides to use clang/ldd/libc++, RocksDB will pick up the compiler/linker but still use libstdc++, which is incompatible to libc++, causing Symobl Missing error during the link stage. With this patch, if FDB uses libc++, then the information is stored in CMAKE_CXX_FLAGS and being forwarded to RocksDB. RocksDB will then use libc++ and compatible with FDB. * fixup! Fix the clang error in bindings/c * add some rocksdb compile options that can be passed in at build time * Disconnection to satellite TLog should trigger recovery in gray failure detection * Upgrade sphinx and document test harness and code probes * Apply suggestions from code review Co-authored-by: Trevor Clinkenbeard <[email protected]> Co-authored-by: Bharadwaj V.R <[email protected]> * clarify how code probes are reported * clarify statistics of TestHarness * Bump setuptools from 65.3.0 to 65.5.1 in /documentation/sphinx Bumps [setuptools](https://github.com/pypa/setuptools) from 65.3.0 to 65.5.1. - [Release notes](https://github.com/pypa/setuptools/releases) - [Changelog](https://github.com/pypa/setuptools/blob/main/CHANGES.rst) - [Commits](https://github.com/pypa/setuptools/compare/v65.3.0...v65.5.1) --- updated-dependencies: - dependency-name: setuptools dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> * Add event for txn server initialization and a warning for TLog slow catching up * Change TLog pull async data warning timeout * Adding rocksDB control compaction on deletion knobs. (#9165) * Add 7.1.26, 7.1.27 release notes (#9186) * Added metrics for read range operations. * Log PingLatency when there is no ping latency samples, but ping attempts * Changing histogram type. (#9227) * Release 7.1: Cherry pick pull request #9225 (#9252) * - Do not add fdbserver processes to the client list. (#9225) Note: Server processes started getting reported as clients since 7.1.0 (not sure if this change in behavior was intentional or not), and this breaks the operator upgrade logic. * - Address a compilation error * - Update release-notes. * - Address a review comment/CI failure. * - Address CI, related to release notes, failure. * disable AVX for 7.1.26 release * enable AVX and update version for 7.1.27 release * update version after 7.1.27 release * Increase buggified lock bytes for backup workers to at least 256 MB. We are still encountered simulation failures where the backup worker is waiting on the lock and an assertion fails. * Reduce logging level for verbose events From one of nightly failure due to too many log lines, these are top 3: 60100 FastRestoreLoaderDispatchRequests 79655 FastRestoreGetVersionSize 93888 FastRestoreSplitMutation * Fix typo in fdb.options * update bindings/go/src/fdb/generated.go * Fix getMappedRange metrics(release-7.1) (#9331) * Fix getMappedRange metrics Metrics related to getMappedRange API are counted twice, having a set of new metrics specifically for getMappedRange solves the issue. * Fix clang init order issue * Enable rocksdb in simulation in 7.1. Exclude FuzzApi and HighContention tests temporarily for rocksdb. (#9374) * Fix IDE build and warnings * Rocksdb knob changes. (#9393) * Fix compiler warnings * Add exclude to fdbcli's configure command Right now this only allows one server address being excluded. This is useful when the database is unavailable but we want the recruitment to skip some particular processes. Manually tested the concept works with a loopback cluster. * Allow a comma separated list of excluded addresses * Add ClogTlog workload * Update clogTlog workload to be single region * Exclude failed tlog if recovery stuck more than 30s Because the tlog is clogged, recovery can stuck in initializing_transaction_servers. This exclude allows the recovery to complete. * Change to only clog once for a particular tlog If we repeat clogging, different tlogs may be excluded, which can cause the recovery to stuck. * Move ClogTlog.toml to rare * Fix rare test failures Unclog after DB is recovered, otherwise another recovery may become stuck again. * Address review comments * Allow fdbdecode to read filters from a file * Fix filter delimiter and print sub versions * Use KeyRangeMap for better matching performance * fdbdecode: read backup range files * add filtering * Allow fdbdecode to read filters from a file * Fix filter delimiter and print sub versions * Use KeyRangeMap for better matching performance * Disable filter validate by default * Use RangeMap for backup agent filtering This is more efficient than going through ranges one by one. * Refactor code * Allow fdbbackup, fdbrestore to read keyranges from a file * Use the RangeMapFilters * add command line option * Clang-format * Fix -t flag bug for fdbdecode (#9489) * Fix fdbbackup query returning earliest version * Query backup size from a specific snapshot * clean format * Explicitly using min and max restorable version from backup description in query command in stead of going throw snapshots * fix clang build error * Add more comments in fdbbackup query command, and address comments * Change PTreeImpl::insert to overwrite existing entries (#9138) * Change PTreeImpl::insert to overwrite existing entries Maintaining partial persistence of course. We can theoretically also avoid creating a new node if the insert version of the node comparing equal to `x` is the latestVersion. There isn't a generic way to tell from the ptree though since insertAt is a concept that only exists within VersionedMap. Either way, avoiding the `contains` call and the tree rotations is already a big improvement. The old node should only be reachable from old roots, and so it should get cleaned up as part of forgetVersions in the storage server. * Update fdbclient/include/fdbclient/VersionedMap.h * Avoid repeated search in VersionedMap::erase(iterator) (#9143) * Use KeyspaceSnapshotFile to filter range files * Change mutation and KV logging to SevInfo Set max length as well to avoid TraceEventOverflow. * Output in HEX format for easy regex matching * Refactor decoder to read file as a whole once To reduce the number of network requests. * Add more trace events * Allow log router to detect slow peeks and to switch DC for peeking [release-7.1] (#9640) * Add DcLag tests and workload * Add disableSimSpeedup to clog network longer * Ignore the DcLag test * Refactor LogRouter's pullAsyncData * Switch DC if log router peek becomes stuck Trying to a different DC if this happens. * Enable DcLag test * Require at least 2 regions and having satellites * Simplify DcLag code * Limit connection failures to be within tests In particular, disable connection failures when initializing the database during the startup phase, i.e., before running with test specs. * Revert disableSimSpeedup * Fix conflicts after cherrypick * More fixes after cherrypick * Refactor to address comments * Use a constant for connectionFailuresDisableDuration * Fix ClogTlog workload valgrind error * Address comments * Reduce running time for DcLag The switch can happen quicker than the workload detection time, so need to adjust detection time lower than LOG_ROUTER_PEEK_SWITCH_DC_TIME. * Fix issue where the versions on seed storage servers decreased Seed storage servers are recruited as the intial set of storage servers when a database is first created. They function a little bit differently than normal, and do not set an initial version like storages normally do when they get recruited (typically equal to the recovery version). Version correction is a feature where versions advance in sync with the clock, and are equal across FDB clusters. To allow different FDB clusters to have matching versions, they must share the same base version. This defaults to the Unix epoch, and clusters with the version epoch enabled will have a current version equal to the number of microseconds since the Unix epoch. When the version epoch is enabled on a cluster, it causes a one time jump from the clusters current version to the version based on the epoch. After a recovery, the recovery version sent to storages should have advanced by a significant amount. The recovery path contained a `BUGGIFY` to randomly advance the recovery version in simulation, testing the version epoch being enabled. However, it was also advancing the version during an initial recovery, when the seed storage servers are recruited. If a set of storage servers were recruited as seed servers, but another recovery occurred before the bootstrap process was complete, the randomly selected version increase could be smaller during the second recovery than during the first. This could cause the initial set of seed servers to think they should be at a version larger than what the cluuster was actually at. The fix contained in this commit is to only cause a random version jump when the recovery is occuring on an existing database, and not when it is recruiting seed storages. This commit fixes an issue found in simulation, reproducible with: Commit: 93dc4bfeb97a700bafa4b34bc18d38a248e47b35 Test: fast/DataLossRecovery.toml Seed: 3101495991 Buggify: on Compiler: clang * Added 7.1.28 and 7.1.29 release notes * Reduce running time for ClogTlog When the ClogTlog is running, we may already pass the 450s, i.e., SIM_SPEEDUP_AFTER_SECONDS, and clogging is no longer effective. If that's the case, we want to finish the test quickly. * disable AVX for 7.1.28 release * enable AVX and update version for 7.1.29 release * Removed an old stuff for clang compilation * Fixed compilation error with clang 15 * Fixed deploy on debian * Moved the test deploy logic from thw git workflow file to a bash script. * Fixed workflow syntax * Added libatomic to the dockerfile --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Jingyu Zhou <[email protected]> Co-authored-by: Dan Lambright <[email protected]> Co-authored-by: FoundationDB CI <[email protected]> Co-authored-by: neethuhaneesha <[email protected]> Co-authored-by: Jingyu Zhou <[email protected]> Co-authored-by: Hao Fu <[email protected]> Co-authored-by: hao fu <[email protected]> Co-authored-by: Yao Xiao <[email protected]> Co-authored-by: Meng Xu <[email protected]> Co-authored-by: Huiyoung <[email protected]> Co-authored-by: sfc-gh-tclinkenbeard <[email protected]> Co-authored-by: Zhe Wu <[email protected]> Co-authored-by: Sreenath Bodagala <[email protected]> Co-authored-by: Meng Xu <[email protected]> Co-authored-by: Xiaoge Su <[email protected]> Co-authored-by: Aaron Molitor <[email protected]> Co-authored-by: Markus Pilman <[email protected]> Co-authored-by: Bharadwaj V.R <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Dan Adkins <[email protected]> Co-authored-by: Vishesh Yadav <[email protected]> Co-authored-by: Andrew Noyes <[email protected]> Co-authored-by: Lukas Joswiak <[email protected]> Co-authored-by: Oleg Samarin <[email protected]>
v7.1.25-6.ow
XDB-144 Disabled building docker images on intermediate github builds…
v6.3.25-6.ow
XDB-144 Disabled building docker images on intermediate github builds…
v7.1.25-5.ow
XDB-102 Fixed comparing cmake versions 3.7 with 3.13 #47
v6.3.25-5.ow
XDB-102 Moved the build logic from the github workflow to bash script…