ClusterSlots uses too much memory #17789

sakridge · 2021-06-07T11:45:35Z

Problem

const CLUSTER_SLOTS_TRIM_SIZE: usize = 524_288; // 512K
pub type SlotPubkeys = HashMap<Pubkey, u64>;

#[derive(Default)]
pub struct ClusterSlots {
    cluster_slots: RwLock<BTreeMap<Slot, Arc<RwLock<SlotPubkeys>>>>,

hashbrown::raw::RawTableInner$LT$A$GT$::prepare_resize::hd9de4d7705dc4632
  in /home/stephen_solana_com/solana/target/release/solana-validator
5.89GB consumed over 1807948 calls from:
    0x55b0a435cb61
      in /home/stephen_solana_com/solana/target/release/solana-validator
    _$LT$hashbrown..map..HashMap$LT$K$C$V$C$S$C$A$GT$$u20$as$u20$core..iter..traits..collect..Extend$LT$$LP$K$C$V$RP$$GT$$GT$::extend::hc3281369a9ad10d2
      in /home/stephen_solana_com/solana/target/release/solana-validator
    solana_core::cluster_slots::ClusterSlots::update::h8fcd014c2713807e
      in /home/stephen_solana_com/solana/target/release/solana-validator
    solana_core::cluster_slots_service::ClusterSlotsService::run::h4f4d6ebf13410164
      in /home/stephen_solana_com/solana/target/release/solana-validator
    std::sys_common::backtrace::__rust_begin_short_backtrace::h253287d3ac949d1d
      in /home/stephen_solana_com/solana/target/release/solana-validator
    core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h5d585c3e7664364f
      in /home/stephen_solana_com/solana/target/release/solana-validator
    _$LT$alloc..boxed..Box$LT$F$C$A$GT$$u20$as$u20$core..ops..function..FnOnce$LT$Args$GT$$GT$::call_once::hc444a77f8dd8d825
      at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/alloc/src/boxed.rs:1546
      in /home/stephen_solana_com/solana/target/release/solana-validator
    _$LT$alloc..boxed..Box$LT$F$C$A$GT$$u20$as$u20$core..ops..function..FnOnce$LT$Args$GT$$GT$::call_once::h8b68a0a9a2093dfc
      at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/alloc/src/boxed.rs:1546
    std::sys::unix::thread::Thread::new::thread_start::hb95464447f61f48d
      at library/std/src/sys/unix/thread.rs:71
    start_thread
      in /lib/x86_64-linux-gnu/libpthread.so.0
    __clone
      in /lib/x86_64-linux-gnu/libc.so.6

Proposed Solution

Can we reduce CLUSTER_SLOTS_TRIM_SIZE ?

Or find a more efficient way to store this information. Instead use a map of pubkey -> BitVec or compressed BitVec?

Page out to disk in files or blockstore?

Consumers:
replay stage -> propagation stats
repair service -> pruning slots

The text was updated successfully, but these errors were encountered:

sakridge · 2021-06-08T13:38:01Z

@carllin @behzadnouri What do you guys think?

behzadnouri · 2021-06-08T14:30:20Z

Yeah, let me look into this, since it was brought up in #14366 (comment) as well.

CLUSTER_SLOTS_TRIM_SIZE doesn't do much since it only limits the size of the outer map. Inner ones still can get pretty large. It is only intended to mitigate the case where a node receives bogus epoch slots.

But, bit-vec idea is promising. I will give it a shot.

carllin · 2021-06-08T21:21:41Z

Yeah I was also thinking maybe you could probably page out:

Much smaller slots
Validators with low/0 stake as they probably aren't used for propagation stats, only rarely if they are selected for repair (Could represent all the 0 staked validators with one representative from that 0 staked set and occasionally shuffle that representative for instance).

behzadnouri · 2021-06-10T18:34:58Z

Testing on TDS seems like dropping unstaked nodes does not help much. Cluster slots still has ~520k inner hashmaps. So the slots are spread over a range of size ~520k which sounds very suspicious to me.
Some snapshot of the slots with highest number of staked nodes look like:

[(80435455, 1755), (80435451, 1752), (80435450, 1752), (80435467, 1747), (80435452, 1747), (80435454, 1744), (80435475, 1743), (80435463, 1743), (80435471, 1739), (80435470, 1739), ....]

and with lowest number of staked nodes (from the same snapshot):

[(80556216, 1), (80556212, 1), (80556211, 1), (80556210, 1), (80556209, 1), (80556208, 1), (80556207, 1), (80556206, 1), (80556205, 1), (80556204, 1), ...]

behzadnouri · 2021-06-11T18:25:59Z

This seems to be because of crds values from nodes with different shred_version are creeping in. Excluding them from get_epoch_slots will fix this. I will send out a patch soon.

Inspecting TDS gossip table shows that crds values of nodes with different shred-versions are creeping in. Their epoch-slots are accumulated in ClusterSlots causing bogus slots very far from current root which are not purged and so cause ClusterSlots keep consuming more memory: solana-labs#17789 solana-labs#14366 (comment) solana-labs#14366 (comment) This commit updates ClusterInfo::get_epoch_slots, and discards entries from nodes with unknown or different shred-version. Follow up commits will patch gossip not to waste bandwidth and memory over crds values of nodes with different shred-version.

…on (#17899) Inspecting TDS gossip table shows that crds values of nodes with different shred-versions are creeping in. Their epoch-slots are accumulated in ClusterSlots causing bogus slots very far from current root which are not purged and so cause ClusterSlots keep consuming more memory: #17789 #14366 (comment) #14366 (comment) This commit updates ClusterInfo::get_epoch_slots, and discards entries from nodes with unknown or different shred-version. Follow up commits will patch gossip not to waste bandwidth and memory over crds values of nodes with different shred-version.

…on (#17899) Inspecting TDS gossip table shows that crds values of nodes with different shred-versions are creeping in. Their epoch-slots are accumulated in ClusterSlots causing bogus slots very far from current root which are not purged and so cause ClusterSlots keep consuming more memory: #17789 #14366 (comment) #14366 (comment) This commit updates ClusterInfo::get_epoch_slots, and discards entries from nodes with unknown or different shred-version. Follow up commits will patch gossip not to waste bandwidth and memory over crds values of nodes with different shred-version. (cherry picked from commit 985280e)

…on (#17899) (#17916) Inspecting TDS gossip table shows that crds values of nodes with different shred-versions are creeping in. Their epoch-slots are accumulated in ClusterSlots causing bogus slots very far from current root which are not purged and so cause ClusterSlots keep consuming more memory: #17789 #14366 (comment) #14366 (comment) This commit updates ClusterInfo::get_epoch_slots, and discards entries from nodes with unknown or different shred-version. Follow up commits will patch gossip not to waste bandwidth and memory over crds values of nodes with different shred-version. (cherry picked from commit 985280e) Co-authored-by: behzad nouri <[email protected]>

behzadnouri · 2021-07-16T14:43:18Z

reopening this since cluser_slots is still consuming 7GB on mainnet:
https://discord.com/channels/428295358100013066/439194979856809985/865513541820743720

behzadnouri · 2021-08-10T20:45:46Z

Checking on testnet, all the erroneous epoch-slots are from: 77VrLXNcVnpPHcMCcfkNU8SpuPzoCbGfp3Kqc2iV1dgu
which is using same identity for both mainnet and testnet; and so causing cross contamination between clusters.

Probably the node does not halt because NodeInstances are not propagated due to different shred-versions:
https://github.com/solana-labs/solana/blob/faf99f476/gossip/src/cluster_info.rs#L2321-L2326

However EpochSlots are cross-contaminating which is confusing why. Maybe ContactInfo is updated after shred-version of EpochSlots is checked and so it bypasses the shred-version check. But not sure why the same does not happen with NodeInstance.

On testnet:

Identity                                      Vote Account                            Commission  Last Vote       Root Slot    Skip Rate  Credits  Version  Active Stake
77VrLXNcVnpPHcMCcfkNU8SpuPzoCbGfp3Kqc2iV1dgu  213LE1Jpb4KwfAeqnWY9dzr3TpYoadGk2oQmHXML2AdZ  100%  88898340 ( -2)  88898293 ( -1) 100.00%   124619    1.7.9  5000.98401984 SOL (0.01%)

IP Address        |Age(ms)| Node identifier                              | Version |Gossip| TPU  |TPUfwd| TVU  |TVUfwd|Repair|ServeR|ShredVer
------------------+-------+----------------------------------------------+---------+------+------+------+------+------+------+------+--------
157.90.210.250    |   338 | 77VrLXNcVnpPHcMCcfkNU8SpuPzoCbGfp3Kqc2iV1dgu |  1.7.9  | 8000 | 8003 | 8004 | 8001 | 8002 | 8006 | 8007 | 18122

On mainnet:

Identity                                      Vote Account                            Commission  Last Vote       Root Slot    Skip Rate  Credits  Version  Active Stake
77VrLXNcVnpPHcMCcfkNU8SpuPzoCbGfp3Kqc2iV1dgu  8RCqWaeUgVvPEuHxa3SchprM1jm2dyJTbEYGmfMBmiyQ   10%  91087005 ( -2)  91086940 ( -4)      -    269272   1.6.20  200.908629969 SOL (0.00%)

IP Address        |Age(ms)| Node identifier                              | Version |Gossip| TPU  |TPUfwd| TVU  |TVUfwd|Repair|ServeR|ShredVer
------------------+-------+----------------------------------------------+---------+------+------+------+------+------+------+------+--------
65.21.94.24       |   260 | 77VrLXNcVnpPHcMCcfkNU8SpuPzoCbGfp3Kqc2iV1dgu | 1.6.20  | 8000 | 8003 | 8004 | 8001 | 8002 | 8006 | 8007 | 13490

behzadnouri · 2021-08-15T14:45:29Z

#19190 should mitigate above situation where a node is using same identity key for testnet and mainnet clusters. Also, announced on discord for the nodes to change their identity keys:
https://discord.com/channels/428295358100013066/594138785558691840/874801862853939200

Cross cluster gossip contamination is causing cluster-slots hash map to contain a lot of bogus values and consume too much memory: solana-labs#17789 If a node is using the same identity key across clusters, then these erroneous values might not be filtered out by shred-versions check, because one of the variants of the contact-info will have matching shred-version: solana-labs#17789 (comment) The cluster-slots hash-map is bounded and trimmed at the lower end by the current root. This commit also discards slots epochs ahead of the root.

Cross cluster gossip contamination is causing cluster-slots hash map to contain a lot of bogus values and consume too much memory: #17789 If a node is using the same identity key across clusters, then these erroneous values might not be filtered out by shred-versions check, because one of the variants of the contact-info will have matching shred-version: #17789 (comment) The cluster-slots hash-map is bounded and trimmed at the lower end by the current root. This commit also discards slots epochs ahead of the root.

Cross cluster gossip contamination is causing cluster-slots hash map to contain a lot of bogus values and consume too much memory: #17789 If a node is using the same identity key across clusters, then these erroneous values might not be filtered out by shred-versions check, because one of the variants of the contact-info will have matching shred-version: #17789 (comment) The cluster-slots hash-map is bounded and trimmed at the lower end by the current root. This commit also discards slots epochs ahead of the root. (cherry picked from commit 563aec0)

Cross cluster gossip contamination is causing cluster-slots hash map to contain a lot of bogus values and consume too much memory: #17789 If a node is using the same identity key across clusters, then these erroneous values might not be filtered out by shred-versions check, because one of the variants of the contact-info will have matching shred-version: #17789 (comment) The cluster-slots hash-map is bounded and trimmed at the lower end by the current root. This commit also discards slots epochs ahead of the root. (cherry picked from commit 563aec0) Co-authored-by: behzad nouri <[email protected]>

…on (#17899) Inspecting TDS gossip table shows that crds values of nodes with different shred-versions are creeping in. Their epoch-slots are accumulated in ClusterSlots causing bogus slots very far from current root which are not purged and so cause ClusterSlots keep consuming more memory: #17789 #14366 (comment) #14366 (comment) This commit updates ClusterInfo::get_epoch_slots, and discards entries from nodes with unknown or different shred-version. Follow up commits will patch gossip not to waste bandwidth and memory over crds values of nodes with different shred-version. (cherry picked from commit 985280e) # Conflicts: # core/src/cluster_info.rs

…on (backport #17899) (#19551) * excludes epoch-slots from nodes with unknown or different shred version (#17899) Inspecting TDS gossip table shows that crds values of nodes with different shred-versions are creeping in. Their epoch-slots are accumulated in ClusterSlots causing bogus slots very far from current root which are not purged and so cause ClusterSlots keep consuming more memory: #17789 #14366 (comment) #14366 (comment) This commit updates ClusterInfo::get_epoch_slots, and discards entries from nodes with unknown or different shred-version. Follow up commits will patch gossip not to waste bandwidth and memory over crds values of nodes with different shred-version. (cherry picked from commit 985280e) # Conflicts: # core/src/cluster_info.rs * removes backport merge conflicts Co-authored-by: behzad nouri <[email protected]>

behzadnouri self-assigned this Jun 8, 2021

behzadnouri mentioned this issue Jun 11, 2021

excludes epoch-slots from nodes with unknown or different shred version #17899

Merged

behzadnouri linked a pull request Jun 11, 2021 that will close this issue

excludes epoch-slots from nodes with unknown or different shred version #17899

Merged

behzadnouri closed this as completed in #17899 Jun 13, 2021

behzadnouri reopened this Jul 16, 2021

behzadnouri mentioned this issue Aug 16, 2021

discards epoch-slots epochs ahead of the current root #19256

Merged

github-actions bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Dec 20, 2022

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 28, 2022

behzadnouri removed the stale [bot only] Added to stale content; results in auto-close after a week. label Jan 3, 2023

behzadnouri reopened this Jan 3, 2023

github-actions bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Jan 4, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ClusterSlots uses too much memory #17789

ClusterSlots uses too much memory #17789

sakridge commented Jun 7, 2021

sakridge commented Jun 8, 2021

behzadnouri commented Jun 8, 2021

carllin commented Jun 8, 2021

behzadnouri commented Jun 10, 2021

behzadnouri commented Jun 11, 2021

behzadnouri commented Jul 16, 2021

behzadnouri commented Aug 10, 2021

behzadnouri commented Aug 15, 2021

ClusterSlots uses too much memory #17789

ClusterSlots uses too much memory #17789

Comments

sakridge commented Jun 7, 2021

Problem

Proposed Solution

sakridge commented Jun 8, 2021

behzadnouri commented Jun 8, 2021

carllin commented Jun 8, 2021

behzadnouri commented Jun 10, 2021

behzadnouri commented Jun 11, 2021

behzadnouri commented Jul 16, 2021

behzadnouri commented Aug 10, 2021

behzadnouri commented Aug 15, 2021