Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[docs] Bump the data size and tablets limits
Some of the changes that landed in 1.4.0, namely Todd's memory consumption and log segments improvements, plus the beginning of Adar's thread consolidation effort, make it so that it's easier for Kudu to store more data per node. Some notes (mostly coming from Adar): - Memory consumption now seems to be around 1.5GB / TB of data on disk after startup for a TPC-H lineitem table. - File descriptor consumption is about 2 per log segment plus 1 per log index. Tablets with some replication lag will use more segments. To that is added the fd cache that defaults to 40% of the configured max fds. - Thread usage is about 5 for hot replicas, then 2 when they become cold (new 1.4.0 concept that Todd added). Based on the above, doubling our current limitations of 4TB spread over 1000 tablets to 8TB spread over 2000 means that: - 8TB requires at least 12GB of memory, then some more for the MRS, block cache, and scanners (around 256KB per column per scan). - 6000 fds are required to spin up 2000 tablets, plus what the fd cache uses. - 10k threads are required to just to start Kudu. Change-Id: Ie60d2c3548c402c6a08db9bb724bc6367db989ca Reviewed-on: http://gerrit.cloudera.org:8080/7503 Reviewed-by: Todd Lipcon <[email protected]> Tested-by: Todd Lipcon <[email protected]>
- Loading branch information