very high memory usage due to kernfs_node_cache slabs #1927

redbaron · 2017-04-23T09:31:47Z

Issue Report

Bug

Container Linux Version

NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1353.4.0
VERSION_ID=1353.4.0
BUILD_ID=2017-03-31-0211
PRETTY_NAME="Container Linux by CoreOS 1353.4.0 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"

Environment

AWS

Expected Behavior

Slab allocations stay on a steady level

Actual Behavior

We found high memory usage on all our etcd servers which periodically run healthcheck in rkt containers. rkt run using default stage1 seems to cause kernfs_node_cache allocations leak

Reproduction Steps

F=/tmp/test.uuid
while rkt --uuid-file-save=$F run quay.io/coreos/etcd --exec date;  do
   rkt rm --uuid-file=$F 
done

then observe grep Slab: /proc/meminfo to steadily grow

Other Information

on a server running for just over a day with healthcheck executing every 20-30 seconds:

 Active / Total Objects (% used)    : 36046033 / 39377405 (91.5%)
 Active / Total Slabs (% used)      : 1173649 / 1173649 (100.0%)
 Active / Total Caches (% used)     : 76 / 99 (76.8%)
 Active / Total Size (% used)       : 4787598.38K / 5399366.70K (88.7%)
 Minimum / Average / Maximum Object : 0.01K / 0.14K / 8.00K

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME                   
25194748 25194748 100%    0.12K 741022       34   2964088K kernfs_node_cache      
3961734 2186727  55%    0.19K 188654       21    754616K dentry                 
2591360 1825675  70%    0.06K  40490       64    161960K kmalloc-64             
1680840 1161714  69%    0.09K  40020       42    160080K kmalloc-96             
841296 840797  99%    0.04K   8248      102     32992K numa_policy            
833632 833632 100%    0.12K  26051       32    104204K kmalloc-128            
811377 811160  99%    0.58K  30051       27    480816K inode_cache            
783375 783375 100%    0.31K  31335       25    250680K kmem_cache             
783360 783360 100%    0.06K  12240       64     48960K kmem_cache_node        
571410 571410 100%    0.19K  27210       21    108840K kmalloc-192            
278599  14979   5%    0.68K  12113       23    193808K shmem_inode_cache      
274688 274327  99%    0.03K   2146      128      8584K kmalloc-32             
274688 274327  99%    0.03K   2146      128      8584K kmalloc-32             
212394 211506  99%    0.10K   5446       39     21784K buffer_head            
104704 104618  99%    1.00K   3272       32    104704K kmalloc-1024           
 25600  24878  97%    0.02K    100      256       400K kmalloc-16             
 23868  22947  96%    0.04K    234      102       936K ext4_extent_status     
 20992  20992 100%    0.01K     41      512       164K kmalloc-8              
 15980  15980 100%    0.02K     94      170       376K scsi_data_buffer       
 15120  13729  90%    1.05K    504       30     16128K ext4_inode_cache       
 11815  11815 100%    0.05K    139       85       556K ftrace_event_field     
  9716   8680  89%    0.57K    347       28      5552K radix_tree_node        
  9216   8729  94%    0.06K    144       64       576K anon_vma_chain         
  6681   6681 100%    0.08K    131       51       524K anon_vma               
  5346   5181  96%    0.18K    243       22       972K vm_area_struct         
  5216   5216 100%    0.25K    163       32      1304K kmalloc-256            
  4200   3855  91%    0.07K     75       56       300K Acpi-Operand           
  3904   3811  97%    0.50K    122       32      1952K kmalloc-512            
  2975   2765  92%    0.63K    119       25      1904K proc_inode_cache       
  2163   2163 100%    0.19K    103       21       412K cred_jar               
  2112   2112 100%    0.06K     33       64       132K ext4_io_end            
  2032   2020  99%    2.00K    127       16      4064K kmalloc-2048           
  2010   1974  98%    0.53K     67       30      1072K idr_layer_cache        
  1242   1242 100%    0.09K     27       46       108K trace_event_file       
  1100   1100 100%    0.62K     44       25       704K sock_inode_cache       
   800    800 100%    0.12K     25       32       100K pid                    
   714    714 100%    0.12K     21       34        84K jbd2_journal_head      
   693    693 100%    0.38K     33       21       264K mnt_cache              
   512    512 100%    1.00K     16       32       512K mm_struct              
   512    512 100%    0.02K      2      256         8K jbd2_revoke_table_s    
   510    510 100%    1.06K     17       30       544K signal_cache           
   476    476 100%    0.14K     17       28        68K ext4_groupinfo_4k      
   330    293  88%    2.06K     22       15       704K sighand_cache          
   322    322 100%    0.69K     14       23       224K files_cache            
   294    294 100%    0.37K     14       21       112K blkdev_requests        
   260    215  82%    7.38K     65        4      2080K task_struct            
   256    256 100%    0.03K      2      128         8K fscrypt_info           
   232    210  90%    4.00K     29        8       928K kmalloc-4096           
   204    204 100%    0.08K      4       51        16K Acpi-State             
   170    170 100%    0.05K      2       85         8K fscrypt_ctx            
   168    168 100%    0.75K      8       21       128K task_group

The text was updated successfully, but these errors were encountered:

bgilbert · 2017-05-06T00:11:02Z

I was able to reproduce this in qemu. However, dropping caches with echo 3 > /proc/sys/vm/drop_caches returned kernfs_node_cache to a more reasonable size, so it appears that those allocations aren't actually leaked. Is the large cache size causing problems for you, such as additional memory pressure on the system?

danielwhelansb · 2017-06-15T04:03:48Z

This is also an issue for us. See below for details on the setup. We are running on AWS on m4.xlarge.

Free memory keeps getting lower and lower. All we are running on the machine is etcd 3.1.8 and datadog agent. top/ps are showing resident memory use of around 100MB for etcd and its the biggest memory usage, everything else is a lot smaller.

However the kernfs_node_cache was also really high. When we did echo 3 > /proc/sys/vm/drop_caches it free'd up some memory. Not all of it though. There is still a chunk of memory being used which I am also still investigating.

NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1353.8.0
VERSION_ID=1353.8.0
BUILD_ID=2017-05-30-2322
PRETTY_NAME="Container Linux by CoreOS 1353.8.0 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"

bgilbert · 2017-06-15T04:12:38Z

@chobomuffin, is this issue affecting the reliability or performance of the system, or does it appear to be cosmetic only?

danielwhelansb · 2017-06-15T04:15:43Z

Well on our dev environment the etcd process ended up getting killed. Out of memory: Kill process 3033 (etcd) score 27 or sacrifice child but it has 4gig memory. It is just running etcd on it.

On our staging machines they are 16gig, so it is taking a while to get as low as the dev boxes got.

bgilbert · 2017-06-15T04:25:42Z

Could you post the kernel logs from that event?

danielwhelansb · 2017-06-15T04:51:05Z

I can't yet, those machines were all rebooted since. I will wait for it too happen again (took around 5 days of uptime) and post the logs.

bgilbert · 2017-06-15T15:59:15Z

If the AWS instances still exist, you can get the kernel logs for the previous boot with journalctl -kb -1, for two boots ago with -kb -2, and so on.

Motivation is to avoid serious memory leak on etcd nodes as in * coreos/bugs#1927

ytsarev · 2017-06-16T09:15:37Z

JFYI: that what we did to mitigate the issue in kube-aws context kubernetes-retired/kube-aws#705

Motivation is to avoid serious memory leak on etcd nodes as in * coreos/bugs#1927

bgilbert · 2018-10-05T23:37:22Z

Closing due to inactivity.

amigrave · 2019-06-20T13:07:03Z

It's unfortunate that this issue was not researched further. I have the same issue (albeit not using rkt but an in-house container system) where we have kernfs_node_cache is growing until it put too much pressure on the page table application caches hence strongly degrading performances.

If @ytsarev says that switching to docker solves this problem then there's something docker must be doing in order to prevent this kernfs_node_cache usage that rkt does not.

nunojpg · 2020-01-16T09:17:43Z

After some review I believe the underlying issue is sshd socket activation: systemd/systemd#6567

riking · 2020-07-21T18:48:40Z

This requires a kernel fix to avoid unbounded cache growth.

amigrave · 2020-07-21T18:55:37Z

@riking : are you referring to a particular kernel fix that has been already merged in a recent version ?

euank added area/stability component/kernel kind/bug team/os priority/P1 labels May 5, 2017

bgilbert self-assigned this May 5, 2017

euank added needs/more-information and removed priority/P1 labels May 6, 2017

redbaron mentioned this issue May 24, 2017

Improved node drainer kubernetes-retired/kube-aws#674

Merged

bgilbert removed their assignment Jun 15, 2017

ytsarev pushed a commit to iflix/kube-aws that referenced this issue Jun 16, 2017

Use docker instead of rkt for regular etcdadm tasks

bad7c07

Motivation is to avoid serious memory leak on etcd nodes as in * coreos/bugs#1927

ytsarev pushed a commit to iflix/kube-aws that referenced this issue Jun 16, 2017

Use docker instead of rkt for regular etcdadm tasks (#4)

a889e74

Motivation is to avoid serious memory leak on etcd nodes as in * coreos/bugs#1927

ytsarev pushed a commit to iflix/kube-aws that referenced this issue Jun 16, 2017

Use docker instead of rkt for regular etcdadm tasks (#4)

062ea37

Motivation is to avoid serious memory leak on etcd nodes as in * coreos/bugs#1927

ytsarev mentioned this issue Jun 16, 2017

Use docker instead of rkt for regular etcdadm tasks (#4) kubernetes-retired/kube-aws#705

Merged

kylehodgetts pushed a commit to HotelsDotCom/kube-aws that referenced this issue Mar 27, 2018

Use docker instead of rkt for regular etcdadm tasks (#4)

1532426

Motivation is to avoid serious memory leak on etcd nodes as in * coreos/bugs#1927

bgilbert closed this as completed Oct 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

very high memory usage due to kernfs_node_cache slabs #1927

very high memory usage due to kernfs_node_cache slabs #1927

redbaron commented Apr 23, 2017

bgilbert commented May 6, 2017

danielwhelansb commented Jun 15, 2017

bgilbert commented Jun 15, 2017

danielwhelansb commented Jun 15, 2017

bgilbert commented Jun 15, 2017

danielwhelansb commented Jun 15, 2017

bgilbert commented Jun 15, 2017

ytsarev commented Jun 16, 2017

bgilbert commented Oct 5, 2018

amigrave commented Jun 20, 2019

nunojpg commented Jan 16, 2020

riking commented Jul 21, 2020

amigrave commented Jul 21, 2020

very high memory usage due to kernfs_node_cache slabs #1927

very high memory usage due to kernfs_node_cache slabs #1927

Comments

redbaron commented Apr 23, 2017

Issue Report

Bug

Container Linux Version

Environment

Expected Behavior

Actual Behavior

Reproduction Steps

Other Information

bgilbert commented May 6, 2017

danielwhelansb commented Jun 15, 2017

bgilbert commented Jun 15, 2017

danielwhelansb commented Jun 15, 2017

bgilbert commented Jun 15, 2017

danielwhelansb commented Jun 15, 2017

bgilbert commented Jun 15, 2017

ytsarev commented Jun 16, 2017

bgilbert commented Oct 5, 2018

amigrave commented Jun 20, 2019

nunojpg commented Jan 16, 2020

riking commented Jul 21, 2020

amigrave commented Jul 21, 2020