-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat:big key metricx #3000
base: unstable
Are you sure you want to change the base?
feat:big key metricx #3000
Conversation
WalkthroughThis pull request enhances the system’s ability to track and log statistics about large keys. The Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant InfoCmd
participant RedisDB
Client->>InfoCmd: Request InfoStats
loop For each database
InfoCmd->>RedisDB: GetBigKeyStatistics()
RedisDB-->>InfoCmd: Return big key count
end
InfoCmd-->>Client: Return aggregated info (including big_key_count)
sequenceDiagram
participant Client
participant Storage
participant DBInstance
Client->>Storage: GetBigKeyStatistics(property, out)
loop For each DB instance in Storage
Storage->>DBInstance: GetBigKeyStatistics()
DBInstance-->>Storage: Return count
end
Storage-->>Client: Aggregated count and status
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (10)
💤 Files with no reviewable changes (1)
🚧 Files skipped from review as they are similar to previous changes (5)
⏰ Context from checks skipped due to timeout of 90000ms (4)
🔇 Additional comments (14)
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🔭 Outside diff range comments (2)
src/storage/src/redis_zsets.cc (2)
Line range hint
87-1959
: Consider deduplication for frequent key access.The
CheckBigKeyAndLog
is consistently added across all zset operations, but this could lead to duplicate logging of the same key in rapid succession. Consider:
- Adding a cooldown period between logs for the same key
- Implementing a logging rate limiter
- Aggregating logs for the same key over a time window
Line range hint
87-1959
: Add error handling for monitoring.The monitoring implementation should handle failures gracefully:
- Add error handling for logging failures
- Implement timeouts for monitoring operations
- Add a circuit breaker to disable monitoring if it causes issues
🧹 Nitpick comments (9)
src/storage/src/redis.h (2)
373-376
: Optimize GetBigKeyStatistics performanceThe current implementation creates a full copy of the map on each call, which could be expensive for large maps. Consider adding filtering and pagination.
- std::unordered_map<std::string, int> GetBigKeyStatistics() { + std::unordered_map<std::string, int> GetBigKeyStatistics( + size_t limit = 100, + uint32_t min_count = 1) { std::lock_guard<std::mutex> lock(big_key_access_mutex_); - return big_key_access_count_; + std::unordered_map<std::string, int> filtered_stats; + + for (const auto& pair : big_key_access_count_) { + if (pair.second >= min_count && filtered_stats.size() < limit) { + filtered_stats.insert(pair); + } + } + return filtered_stats; }
558-559
: Add documentation for big key tracking membersAdd comments to explain the purpose and lifecycle of these member variables.
+ // Tracks the number of times each key exceeding the size threshold was accessed std::unordered_map<std::string, int> big_key_access_count_; + // Protects concurrent access to big_key_access_count_ std::mutex big_key_access_mutex_;include/pika_server.h (1)
313-313
: Add documentation for GetBigKeyStatisticsAdd a descriptive comment explaining the method's purpose, return value format, and usage.
+ /** + * Returns statistics about keys that exceed the configured size threshold. + * @return Map of key names to their access counts + * @note This is an expensive operation as it creates a copy of the statistics + */ std::unordered_map<std::string, int> GetBigKeyStatistics();src/storage/src/redis_sets.cc (1)
93-93
: LGTM: Monitoring large sets before operationsThe addition of
CheckBigKeyAndLog
helps track large sets before performing add operations.Consider adding error handling for the
CheckBigKeyAndLog
call to ensure monitoring failures don't impact operations.src/storage/src/redis_zsets.cc (2)
87-87
: LGTM! Consider batching the logging calls.The placement of
CheckBigKeyAndLog
is correct - after meta validation but before operations. However, since this monitoring is added to many methods, consider batching the log writes to reduce I/O overhead.
Line range hint
87-1959
: Consider performance impact of monitoring.The universal addition of monitoring could impact performance, especially for high-frequency operations. Consider:
- Sampling instead of monitoring every operation
- Making the monitoring frequency configurable
- Adding metrics to track the monitoring overhead
src/pika_admin.cc (1)
1094-1098
: LGTM! Consider adding total count for better monitoring.The big key statistics section is well integrated into InfoStats. Consider adding a total count of big keys at the start of the section for easier monitoring.
tmp_stream << "# Big Key Statistics\r\n"; + tmp_stream << "total_big_keys:" << big_key_stats.size() << "\r\n"; for (const auto& entry : big_key_stats) { tmp_stream << "key:" << entry.first << ", access_count:" << entry.second << "\r\n"; }
src/storage/src/redis_lists.cc (1)
82-82
: Consider the performance impact of CheckBigKeyAndLog calls.The CheckBigKeyAndLog function is called on every list operation, including reads. For frequently accessed large lists, this could introduce additional overhead.
Consider optimizing by:
- Adding sampling or throttling for read operations
- Caching the big key status for a short duration
- Making the check asynchronous for read operations
Also applies to: 127-127, 163-163, 204-204, 244-244, 280-280, 339-339, 396-396, 441-441, 502-502, 572-572, 705-705, 750-750, 824-824, 1133-1133, 1174-1174, 1212-1212, 1252-1252, 1292-1292
src/storage/src/redis_hashes.cc (1)
Line range hint
82-82
: Consider architectural improvements for big key monitoring.The current implementation adds monitoring consistently across data structures, but could benefit from the following architectural improvements:
Configuration Management:
- Add configurable thresholds for what constitutes a "big key"
- Allow different thresholds for different data structures
- Enable/disable monitoring per data structure
Performance Optimization:
- Implement sampling for high-traffic keys
- Add throttling mechanisms to prevent monitoring overhead
- Consider async monitoring for read operations
Monitoring Strategy:
- Add gradual levels of monitoring (warn, critical, etc.)
- Implement rate limiting for logging
- Consider aggregating stats before logging
Would you like me to propose a detailed design for any of these improvements?
Also applies to: 127-127, 163-163, 204-204, 244-244, 280-280, 339-339, 396-396, 441-441, 502-502, 572-572, 705-705, 750-750, 824-824, 1133-1133, 1174-1174, 1212-1212, 1252-1252, 1292-1292, 98-98, 163-163, 204-204, 248-248, 306-306, 400-400, 485-485, 529-529, 569-569, 634-634, 710-710, 786-786, 850-850, 914-914, 992-992, 1063-1063, 1135-1135, 1198-1198, 1239-1239, 1277-1277, 1317-1317, 1357-1357
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
include/pika_server.h
(1 hunks)src/pika_admin.cc
(1 hunks)src/pika_server.cc
(1 hunks)src/storage/src/redis.h
(2 hunks)src/storage/src/redis_hashes.cc
(22 hunks)src/storage/src/redis_lists.cc
(17 hunks)src/storage/src/redis_sets.cc
(10 hunks)src/storage/src/redis_zsets.cc
(25 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: build_on_ubuntu
- GitHub Check: build_on_macos
- GitHub Check: build_on_centos
- GitHub Check: Analyze (go)
🔇 Additional comments (6)
src/storage/src/redis_sets.cc (6)
173-173
: LGTM: Consistent monitoring patternThe
CheckBigKeyAndLog
call maintains consistency with other set operations for monitoring large keys.
939-939
: LGTM: Monitoring before pop operationsAppropriate placement of
CheckBigKeyAndLog
before performing pop operations.
1057-1057
: LGTM: Monitoring before random member accessThe
CheckBigKeyAndLog
call helps track large sets before random member operations.
1132-1132
: LGTM: Monitoring before remove operationsConsistent monitoring pattern maintained for remove operations.
1362-1362
: LGTM: Monitoring before scan operationsThe
CheckBigKeyAndLog
call helps track large sets during scan operations.
1513-1513
: LGTM: Monitoring before TTL operationsConsistent monitoring pattern maintained across expiry-related operations (Expireat, Persist, TTL).
Also applies to: 1553-1553, 1593-1593
void CheckBigKeyAndLog(const std::string& key, uint64_t size) { | ||
static const uint64_t kBigKeyThreshold = 10000; | ||
if (size > kBigKeyThreshold) { | ||
std::lock_guard<std::mutex> lock(big_key_access_mutex_); | ||
big_key_access_count_[key]++; | ||
std::cerr << "[BIGKEY DETECTED] Key: " << key | ||
<< ", Size: " << size | ||
<< ", Access Count: " << big_key_access_count_[key] << std::endl; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider improving the big key monitoring implementation
The current implementation has several areas that could be enhanced:
- The threshold should be configurable rather than hardcoded
- Logging to stderr may not be suitable for production environments
- Consider adding rate limiting to prevent excessive logging
- Add a cleanup mechanism for
big_key_access_count_
to prevent unbounded memory growth
Consider these improvements:
+ // Configurable threshold with a reasonable default
+ static const uint64_t kDefaultBigKeyThreshold = 10000;
+ uint64_t big_key_threshold_;
+
void CheckBigKeyAndLog(const std::string& key, uint64_t size) {
- static const uint64_t kBigKeyThreshold = 10000;
- if (size > kBigKeyThreshold) {
+ if (size > big_key_threshold_) {
std::lock_guard<std::mutex> lock(big_key_access_mutex_);
+
+ // Rate limit logging using a token bucket
+ if (ShouldLog()) {
big_key_access_count_[key]++;
- std::cerr << "[BIGKEY DETECTED] Key: " << key
- << ", Size: " << size
- << ", Access Count: " << big_key_access_count_[key] << std::endl;
+ LOG(WARNING) << "[BIGKEY DETECTED] Key: " << key
+ << ", Size: " << size
+ << ", Access Count: " << big_key_access_count_[key];
+ }
+
+ // Cleanup old entries periodically
+ MaybeCleanupBigKeyMap();
}
}
Committable suggestion skipped: line range outside the PR's diff.
src/pika_server.cc
Outdated
std::unordered_map<std::string, int> PikaServer::GetBigKeyStatistics() { | ||
std::unordered_map<std::string, int> aggregated_stats; | ||
|
||
for (const auto& db_entry : dbs_) { | ||
auto db_name = db_entry.first; | ||
auto db_instance = db_entry.second; | ||
|
||
if (db_instance) { | ||
auto stats = g_pika_server->GetBigKeyStatistics(); | ||
for (const auto& entry : stats) { | ||
aggregated_stats[entry.first] += entry.second; | ||
} | ||
} | ||
} | ||
|
||
return aggregated_stats; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix infinite recursion in GetBigKeyStatistics
The method has a critical infinite recursion issue where it calls itself recursively on line 814, which will cause stack overflow.
Apply this fix to resolve the recursion:
std::unordered_map<std::string, int> PikaServer::GetBigKeyStatistics() {
std::unordered_map<std::string, int> aggregated_stats;
for (const auto& db_entry : dbs_) {
auto db_name = db_entry.first;
auto db_instance = db_entry.second;
if (db_instance) {
- auto stats = g_pika_server->GetBigKeyStatistics();
+ auto stats = db_instance->storage()->GetBigKeyStatistics();
for (const auto& entry : stats) {
aggregated_stats[entry.first] += entry.second;
}
}
}
return aggregated_stats;
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
std::unordered_map<std::string, int> PikaServer::GetBigKeyStatistics() { | |
std::unordered_map<std::string, int> aggregated_stats; | |
for (const auto& db_entry : dbs_) { | |
auto db_name = db_entry.first; | |
auto db_instance = db_entry.second; | |
if (db_instance) { | |
auto stats = g_pika_server->GetBigKeyStatistics(); | |
for (const auto& entry : stats) { | |
aggregated_stats[entry.first] += entry.second; | |
} | |
} | |
} | |
return aggregated_stats; | |
} | |
std::unordered_map<std::string, int> PikaServer::GetBigKeyStatistics() { | |
std::unordered_map<std::string, int> aggregated_stats; | |
for (const auto& db_entry : dbs_) { | |
auto db_name = db_entry.first; | |
auto db_instance = db_entry.second; | |
if (db_instance) { | |
auto stats = db_instance->storage()->GetBigKeyStatistics(); | |
for (const auto& entry : stats) { | |
aggregated_stats[entry.first] += entry.second; | |
} | |
} | |
} | |
return aggregated_stats; | |
} |
@@ -79,6 +79,7 @@ Status Redis::LIndex(const Slice& key, int64_t index, std::string* element) { | |||
} | |||
if (s.ok()) { | |||
ParsedListsMetaValue parsed_lists_meta_value(&meta_value); | |||
CheckBigKeyAndLog(key.ToString(), parsed_lists_meta_value.Count()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
🛠️ Refactor suggestion
Verification confirms missing CheckBigKeyAndLog call in ListsExpire
The review comment is correct. The CheckBigKeyAndLog function is consistently used across all Redis data type operations, including expire methods, but is missing in ListsExpire. The call should be added after parsing the meta value but before checking the stale state, matching the pattern used in other methods.
The suggested fix in the review comment is accurate:
if (s.ok()) {
ParsedListsMetaValue parsed_lists_meta_value(&meta_value);
+ CheckBigKeyAndLog(key.ToString(), parsed_lists_meta_value.Count());
if (parsed_lists_meta_value.IsStale()) {
return Status::NotFound("Stale");
🔗 Analysis chain
Verify consistent placement of CheckBigKeyAndLog calls.
The CheckBigKeyAndLog function has been added consistently across all list operations. The function is called at the right location - after parsing the meta value but before checking the stale state, which ensures accurate tracking of large keys before any operation is performed.
However, there's one inconsistency in the ListsExpire method at line 1133 where the CheckBigKeyAndLog call is missing.
Let's verify if this is intentional by checking the implementation in other data structures:
Add the missing CheckBigKeyAndLog call in ListsExpire for consistency:
if (s.ok()) {
ParsedListsMetaValue parsed_lists_meta_value(&meta_value);
+ CheckBigKeyAndLog(key.ToString(), parsed_lists_meta_value.Count());
if (parsed_lists_meta_value.IsStale()) {
return Status::NotFound("Stale");
Also applies to: 127-127, 163-163, 204-204, 244-244, 280-280, 339-339, 396-396, 441-441, 502-502, 572-572, 705-705, 750-750, 824-824, 1133-1133, 1174-1174, 1212-1212, 1252-1252, 1292-1292
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if other expire methods also skip the CheckBigKeyAndLog call
rg -A 5 "Expire.*\{" | rg -B 5 "CheckBigKeyAndLog"
Length of output: 50
Script:
#!/bin/bash
# Find expire method implementations
ast-grep --pattern 'Expire($$$) {
$$$
}'
# Also search with ripgrep to be thorough
rg -B 5 -A 10 "Status.*Expire.*\{" src/
Length of output: 19891
Script:
#!/bin/bash
# Check for CheckBigKeyAndLog usage in expire methods
rg -B 5 -A 10 "CheckBigKeyAndLog" src/storage/src/redis_*
Length of output: 83683
Script:
#!/bin/bash
# Check ListsExpire implementation
rg -B 10 -A 10 "ListsExpire.*\{" src/storage/src/redis_lists.cc
Length of output: 1730
@@ -95,6 +95,7 @@ Status Redis::HDel(const Slice& key, const std::vector<std::string>& fields, int | |||
} | |||
if (s.ok()) { | |||
ParsedHashesMetaValue parsed_hashes_meta_value(&meta_value); | |||
CheckBigKeyAndLog(key.ToString(), parsed_hashes_meta_value.Count()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Verify consistent placement of CheckBigKeyAndLog calls.
The CheckBigKeyAndLog function has been added consistently across all hash operations. The placement and usage pattern matches that of the list operations, which is good for maintainability.
However, there's one inconsistency in the HashesExpire method at line 1198 where the CheckBigKeyAndLog call is placed after the stale check, unlike other methods.
Move the CheckBigKeyAndLog call before the stale check for consistency:
if (s.ok()) {
ParsedHashesMetaValue parsed_hashes_meta_value(&meta_value);
CheckBigKeyAndLog(key.ToString(), parsed_hashes_meta_value.Count());
- if (parsed_hashes_meta_value.IsStale()) {
- return Status::NotFound("Stale");
- } else if (parsed_hashes_meta_value.Count() == 0) {
- return Status::NotFound();
- }
+ if (parsed_hashes_meta_value.IsStale()) {
+ return Status::NotFound("Stale");
+ } else if (parsed_hashes_meta_value.Count() == 0) {
+ return Status::NotFound();
+ }
Also applies to: 163-163, 204-204, 248-248, 306-306, 400-400, 485-485, 529-529, 569-569, 634-634, 710-710, 786-786, 850-850, 914-914, 992-992, 1063-1063, 1135-1135, 1198-1198, 1239-1239, 1277-1277, 1317-1317, 1357-1357
cf8c8da
to
d2fd434
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
src/storage/src/redis.h (2)
373-376
: Ensure thread-safe propagation of big key stats.
Returning a copy of the internal map is thread-safe; however, if the stats grow large, copying the entire map might become a performance bottle-neck. Consider whether partial or streaming retrieval is needed.
558-559
: Be mindful of unbounded growth in big_key_access_count_.
Without a cleanup strategy, this map could grow indefinitely if many keys exceed the threshold over time. For large-scale systems, consider a periodic purge or rate-limiting to avoid memory pressure.src/storage/include/storage/storage.h (1)
1109-1109
: Clarify usage of the 'property' parameter.
The newly introduced GetBigKeyStatistics method signature includes a 'property' argument, but the existing design does not explain how it influences or filters the returned results. Consider documenting or removing this parameter if it’s not used.src/pika_admin.cc (1)
1376-1391
: Consider making the property name configurable.The code correctly aggregates big key statistics across all databases, but uses a hardcoded property name "bigkey_property". Consider making this configurable to allow for future flexibility.
- db_item.second->storage()->GetBigKeyStatistics("bigkey_property", &big_key_count); + db_item.second->storage()->GetBigKeyStatistics(g_pika_conf->bigkey_property_name(), &big_key_count);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
src/pika_admin.cc
(4 hunks)src/pika_server.cc
(0 hunks)src/storage/include/storage/storage.h
(1 hunks)src/storage/src/redis.h
(2 hunks)src/storage/src/redis_hashes.cc
(22 hunks)src/storage/src/redis_lists.cc
(17 hunks)src/storage/src/redis_sets.cc
(10 hunks)src/storage/src/redis_zsets.cc
(25 hunks)src/storage/src/storage.cc
(1 hunks)
💤 Files with no reviewable changes (1)
- src/pika_server.cc
🚧 Files skipped from review as they are similar to previous changes (3)
- src/storage/src/redis_lists.cc
- src/storage/src/redis_zsets.cc
- src/storage/src/redis_hashes.cc
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: build_on_macos
- GitHub Check: Analyze (go)
- GitHub Check: build_on_centos
🔇 Additional comments (3)
src/storage/src/redis_sets.cc (1)
93-93
: Adopt a configurable threshold and rate-limiting for large key logging.
These repeated calls to CheckBigKeyAndLog hardcode the threshold of 10,000. As previously discussed, allowing a configurable threshold and limiting log frequency would help manage log spam and overhead in production. Refer to the earlier improvement suggestions regarding big key monitoring thresholds, logging mechanisms, and a cleanup policy for the tracking map.Also applies to: 173-173, 939-939, 1057-1057, 1133-1133, 1513-1513, 1593-1593
src/storage/src/redis.h (1)
362-371
: Refactor suggestion for big key monitoring.
This implementation repeats a hardcoded threshold (kBigKeyThreshold) and writes to std::cerr, which may not be ideal for production. Also, consider making this logging rate-limited and the threshold configurable.src/pika_admin.cc (1)
1094-1095
: LGTM! Added new section for big key statistics.The addition of the "Big Key Statistics" section header is clean and follows the existing format.
|
||
Status Storage::GetBigKeyStatistics(const std::string& property, uint64_t* out) { | ||
uint64_t result = 0; | ||
|
||
for (const auto& inst : insts_) { | ||
auto inst_result = inst->GetBigKeyStatistics(); | ||
|
||
for (const auto& entry : inst_result) { | ||
result += entry.second; | ||
} | ||
} | ||
|
||
*out = result; | ||
return Status::OK(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add error handling and parameter validation.
The implementation has several issues that should be addressed:
- The
property
parameter is unused in the implementation - Missing error handling for potential failures from
inst->GetBigKeyStatistics()
- Missing validation of the
out
pointer
Consider this improved implementation:
Status Storage::GetBigKeyStatistics(const std::string& property, uint64_t* out) {
+ if (!out) {
+ return Status::InvalidArgument("out parameter cannot be null");
+ }
+
uint64_t result = 0;
for (const auto& inst : insts_) {
auto inst_result = inst->GetBigKeyStatistics();
+ Status s = inst->GetBigKeyStatistics(property, inst_result);
+ if (!s.ok()) {
+ return s;
+ }
for (const auto& entry : inst_result) {
result += entry.second;
}
}
*out = result;
return Status::OK();
}
Committable suggestion skipped: line range outside the PR's diff.
d2fd434
to
fa390cd
Compare
Summary by CodeRabbit
New Features
Chores