Skip to content
This repository has been archived by the owner on Oct 16, 2020. It is now read-only.

very high memory usage due to kernfs_node_cache slabs #1927

Closed
redbaron opened this issue Apr 23, 2017 · 13 comments
Closed

very high memory usage due to kernfs_node_cache slabs #1927

redbaron opened this issue Apr 23, 2017 · 13 comments

Comments

@redbaron
Copy link

Issue Report

Bug

Container Linux Version

NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1353.4.0
VERSION_ID=1353.4.0
BUILD_ID=2017-03-31-0211
PRETTY_NAME="Container Linux by CoreOS 1353.4.0 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"

Environment

AWS

Expected Behavior

Slab allocations stay on a steady level

Actual Behavior

We found high memory usage on all our etcd servers which periodically run healthcheck in rkt containers. rkt run using default stage1 seems to cause kernfs_node_cache allocations leak

Reproduction Steps

F=/tmp/test.uuid
while rkt --uuid-file-save=$F run quay.io/coreos/etcd --exec date;  do
   rkt rm --uuid-file=$F 
done

then observe grep Slab: /proc/meminfo to steadily grow

Other Information

on a server running for just over a day with healthcheck executing every 20-30 seconds:

 Active / Total Objects (% used)    : 36046033 / 39377405 (91.5%)
 Active / Total Slabs (% used)      : 1173649 / 1173649 (100.0%)
 Active / Total Caches (% used)     : 76 / 99 (76.8%)
 Active / Total Size (% used)       : 4787598.38K / 5399366.70K (88.7%)
 Minimum / Average / Maximum Object : 0.01K / 0.14K / 8.00K

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME                   
25194748 25194748 100%    0.12K 741022       34   2964088K kernfs_node_cache      
3961734 2186727  55%    0.19K 188654       21    754616K dentry                 
2591360 1825675  70%    0.06K  40490       64    161960K kmalloc-64             
1680840 1161714  69%    0.09K  40020       42    160080K kmalloc-96             
841296 840797  99%    0.04K   8248      102     32992K numa_policy            
833632 833632 100%    0.12K  26051       32    104204K kmalloc-128            
811377 811160  99%    0.58K  30051       27    480816K inode_cache            
783375 783375 100%    0.31K  31335       25    250680K kmem_cache             
783360 783360 100%    0.06K  12240       64     48960K kmem_cache_node        
571410 571410 100%    0.19K  27210       21    108840K kmalloc-192            
278599  14979   5%    0.68K  12113       23    193808K shmem_inode_cache      
274688 274327  99%    0.03K   2146      128      8584K kmalloc-32             
274688 274327  99%    0.03K   2146      128      8584K kmalloc-32             
212394 211506  99%    0.10K   5446       39     21784K buffer_head            
104704 104618  99%    1.00K   3272       32    104704K kmalloc-1024           
 25600  24878  97%    0.02K    100      256       400K kmalloc-16             
 23868  22947  96%    0.04K    234      102       936K ext4_extent_status     
 20992  20992 100%    0.01K     41      512       164K kmalloc-8              
 15980  15980 100%    0.02K     94      170       376K scsi_data_buffer       
 15120  13729  90%    1.05K    504       30     16128K ext4_inode_cache       
 11815  11815 100%    0.05K    139       85       556K ftrace_event_field     
  9716   8680  89%    0.57K    347       28      5552K radix_tree_node        
  9216   8729  94%    0.06K    144       64       576K anon_vma_chain         
  6681   6681 100%    0.08K    131       51       524K anon_vma               
  5346   5181  96%    0.18K    243       22       972K vm_area_struct         
  5216   5216 100%    0.25K    163       32      1304K kmalloc-256            
  4200   3855  91%    0.07K     75       56       300K Acpi-Operand           
  3904   3811  97%    0.50K    122       32      1952K kmalloc-512            
  2975   2765  92%    0.63K    119       25      1904K proc_inode_cache       
  2163   2163 100%    0.19K    103       21       412K cred_jar               
  2112   2112 100%    0.06K     33       64       132K ext4_io_end            
  2032   2020  99%    2.00K    127       16      4064K kmalloc-2048           
  2010   1974  98%    0.53K     67       30      1072K idr_layer_cache        
  1242   1242 100%    0.09K     27       46       108K trace_event_file       
  1100   1100 100%    0.62K     44       25       704K sock_inode_cache       
   800    800 100%    0.12K     25       32       100K pid                    
   714    714 100%    0.12K     21       34        84K jbd2_journal_head      
   693    693 100%    0.38K     33       21       264K mnt_cache              
   512    512 100%    1.00K     16       32       512K mm_struct              
   512    512 100%    0.02K      2      256         8K jbd2_revoke_table_s    
   510    510 100%    1.06K     17       30       544K signal_cache           
   476    476 100%    0.14K     17       28        68K ext4_groupinfo_4k      
   330    293  88%    2.06K     22       15       704K sighand_cache          
   322    322 100%    0.69K     14       23       224K files_cache            
   294    294 100%    0.37K     14       21       112K blkdev_requests        
   260    215  82%    7.38K     65        4      2080K task_struct            
   256    256 100%    0.03K      2      128         8K fscrypt_info           
   232    210  90%    4.00K     29        8       928K kmalloc-4096           
   204    204 100%    0.08K      4       51        16K Acpi-State             
   170    170 100%    0.05K      2       85         8K fscrypt_ctx            
   168    168 100%    0.75K      8       21       128K task_group  
@bgilbert
Copy link
Contributor

bgilbert commented May 6, 2017

I was able to reproduce this in qemu. However, dropping caches with echo 3 > /proc/sys/vm/drop_caches returned kernfs_node_cache to a more reasonable size, so it appears that those allocations aren't actually leaked. Is the large cache size causing problems for you, such as additional memory pressure on the system?

@danielwhelansb
Copy link

This is also an issue for us. See below for details on the setup. We are running on AWS on m4.xlarge.

Free memory keeps getting lower and lower. All we are running on the machine is etcd 3.1.8 and datadog agent. top/ps are showing resident memory use of around 100MB for etcd and its the biggest memory usage, everything else is a lot smaller.

However the kernfs_node_cache was also really high. When we did echo 3 > /proc/sys/vm/drop_caches it free'd up some memory. Not all of it though. There is still a chunk of memory being used which I am also still investigating.

NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1353.8.0
VERSION_ID=1353.8.0
BUILD_ID=2017-05-30-2322
PRETTY_NAME="Container Linux by CoreOS 1353.8.0 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"

@bgilbert
Copy link
Contributor

@chobomuffin, is this issue affecting the reliability or performance of the system, or does it appear to be cosmetic only?

@bgilbert bgilbert removed their assignment Jun 15, 2017
@danielwhelansb
Copy link

Well on our dev environment the etcd process ended up getting killed. Out of memory: Kill process 3033 (etcd) score 27 or sacrifice child but it has 4gig memory. It is just running etcd on it.

On our staging machines they are 16gig, so it is taking a while to get as low as the dev boxes got.

@bgilbert
Copy link
Contributor

Could you post the kernel logs from that event?

@danielwhelansb
Copy link

I can't yet, those machines were all rebooted since. I will wait for it too happen again (took around 5 days of uptime) and post the logs.

@bgilbert
Copy link
Contributor

If the AWS instances still exist, you can get the kernel logs for the previous boot with journalctl -kb -1, for two boots ago with -kb -2, and so on.

ytsarev pushed a commit to iflix/kube-aws that referenced this issue Jun 16, 2017
Motivation is to avoid serious memory leak on etcd nodes as in
* coreos/bugs#1927
ytsarev pushed a commit to iflix/kube-aws that referenced this issue Jun 16, 2017
Motivation is to avoid serious memory leak on etcd nodes as in
* coreos/bugs#1927
ytsarev pushed a commit to iflix/kube-aws that referenced this issue Jun 16, 2017
Motivation is to avoid serious memory leak on etcd nodes as in
* coreos/bugs#1927
@ytsarev
Copy link

ytsarev commented Jun 16, 2017

JFYI: that what we did to mitigate the issue in kube-aws context kubernetes-retired/kube-aws#705

kylehodgetts pushed a commit to HotelsDotCom/kube-aws that referenced this issue Mar 27, 2018
Motivation is to avoid serious memory leak on etcd nodes as in
* coreos/bugs#1927
@bgilbert
Copy link
Contributor

bgilbert commented Oct 5, 2018

Closing due to inactivity.

@bgilbert bgilbert closed this as completed Oct 5, 2018
@amigrave
Copy link

It's unfortunate that this issue was not researched further. I have the same issue (albeit not using rkt but an in-house container system) where we have kernfs_node_cache is growing until it put too much pressure on the page table application caches hence strongly degrading performances.

If @ytsarev says that switching to docker solves this problem then there's something docker must be doing in order to prevent this kernfs_node_cache usage that rkt does not.

@nunojpg
Copy link

nunojpg commented Jan 16, 2020

After some review I believe the underlying issue is sshd socket activation: systemd/systemd#6567

@riking
Copy link

riking commented Jul 21, 2020

This requires a kernel fix to avoid unbounded cache growth.

@amigrave
Copy link

@riking : are you referring to a particular kernel fix that has been already merged in a recent version ?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants