Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrade to go1.20 and auto set go mem limit #5732

Merged
merged 12 commits into from
Mar 20, 2023

Conversation

butonic
Copy link
Member

@butonic butonic commented Mar 3, 2023

rebased #4856

@update-docs
Copy link

update-docs bot commented Mar 3, 2023

Thanks for opening this pull request! The maintainers of this repository would appreciate it if you would create a changelog item based on your changes.

@butonic butonic force-pushed the go-1-19-memlimit branch from a306d21 to 02f43a0 Compare March 6, 2023 16:34
@butonic
Copy link
Member Author

butonic commented Mar 6, 2023

I'm now using the std functionality of the lib together with the k6 standalone rampup tests in owncloud/cdperf#44 and with the default 2k VUS ... which still OOMs the server.

This woks fine

k6 run --env TARGET_BASE_VUS=10 --env TARGET_MAX_VUS=100 --env CLOUD_OIDC_ENABLED=true --env CLOUD_HOST=https://cloud.ocis.test --env CLOUD_LOGIN=admin --env CLOUD_PASSWORD=admin --env CLOUD_VENDOR=ocis --insecure-skip-tls-verify standalone/model-user-ramping-up-stress-test.js

But this easily kills the server (which has 32GB of ram):

k6 run --env CLOUD_OIDC_ENABLED=true --env CLOUD_HOST=https://cloud.ocis.test --env CLOUD_LOGIN=admin --env CLOUD_PASSWORD=admin --env CLOUD_VENDOR=ocis --insecure-skip-tls-verify standalone/model-user-ramping-up-stress-test.js

As soon as the stress phase begins:


running (5m21.9s), 100/200 VUs, 205 complete and 90 interrupted iterations
warmup    ✓ [======================================] 00/10 VUs    1m0s
run       ✓ [======================================] 10 VUs       1m0s
rampup    ✓ [======================================] 090/100 VUs  2m0s
stress      [=============>------------------------] 100 VUs      1m09.2s/3m0s

Virtual memory usage still increases to ~65GB, then swap (36GB) is filled up, than the kernel rids itself of the parasiteprocess by kill -9 ing it...

so ... Granted 2k concurrent clients might be a lot .... two questions:

  1. why does the default memory ratio limit of .9 not work?
  2. why are we using so much ram?

@butonic butonic force-pushed the go-1-19-memlimit branch from 02f43a0 to e5040f9 Compare March 7, 2023 13:58
@butonic butonic force-pushed the go-1-19-memlimit branch 2 times, most recently from d584df9 to f0fd87f Compare March 16, 2023 06:52
@butonic
Copy link
Member Author

butonic commented Mar 16, 2023

regarding

  1. GOMEMLIMIT acts as a 'target' memory limit, so it will actually allocate more than GOMEMLIMIT: https://weaviate.io/blog/gomemlimit-a-game-changer-for-high-memory-applications
  2. the default memory cache has no size limit and will grow indefinitely until TTLs are reached which evict items. We could use the ocmem in memory store, which allows configuring the size. Work is done in Proxy accesstoken cache store #5829

@butonic butonic changed the title upgrade to go1.19 and set go mem limit upgrade to go1.20 and auto set go mem limit Mar 16, 2023
@butonic
Copy link
Member Author

butonic commented Mar 16, 2023

@micbar cool, with go1.20 I can no longer OOM kill my ocis using k6:

$ k6 run --env CLOUD_OIDC_ENABLED=true --env CLOUD_HOST=https://cloud.ocis.test --env CLOUD_LOGIN=admin --env CLOUD_PASSWORD=admin --env CLOUD_VENDOR=ocis --insecure-skip-tls-verify standalone/model-user-ramping-up-stress-test.js

running (13m13.8s), 0000/2000 VUs, 4149 complete and 1296 interrupted iterations
warmup    ✓ [======================================] 077/100 VUs    1m0s
run       ✓ [======================================] 100 VUs        1m0s
rampup    ✓ [======================================] 0660/1000 VUs  2m0s
stress    ✓ [======================================] 1000 VUs       3m0s
recover   ✓ [======================================] 0000/1000 VUs  1m0s
normalize ✓ [======================================] 100 VUs        1m0s

     ✗ dav create status is 201
      ↳  56% — ✓ 373 / ✗ 285
     ✗ dav create status is 204
      ↳  0% — ✓ 0 / ✗ 658
     ✗ dav upload status is 201
      ↳  42% — ✓ 245 / ✗ 333
     ✗ dav move status is 201
      ↳  0% — ✓ 0 / ✗ 338
     ✗ dav delete status is 204
      ↳  22% — ✓ 63 / ✗ 214
     ✗ dav propfind status is 207
      ↳  21% — ✓ 50 / ✗ 187

     █ setup

     █ teardown

     checks.......................................: 26.62% ✓ 731       ✗ 2015  
     cloud_default_play_dav_create_error_rate.....: 1      min=1       max=1   
     cloud_default_play_dav_create_trend..........: avg=22.1s    min=130.78ms med=10.08s   max=1m0s   p(90)=55.82s   p(95)=57.37s  
     cloud_default_play_dav_delete_error_rate.....: 1      min=1       max=1   
     cloud_default_play_dav_delete_trend..........: avg=10.22s   min=116.8ms  med=10.03s   max=19.95s p(90)=12.39s   p(95)=15.6s   
     cloud_default_play_dav_move_error_rate.......: 1      min=1       max=1   
     cloud_default_play_dav_move_trend............: avg=12.14s   min=178.37ms med=10.03s   max=59.64s p(90)=15.71s   p(95)=19.85s  
     cloud_default_play_dav_propfind_error_rate...: 1      min=1       max=1   
     cloud_default_play_dav_propfind_trend........: avg=10.29s   min=231.51ms med=10.03s   max=20.23s p(90)=12.75s   p(95)=15.75s  
     cloud_default_play_dav_upload_error_rate.....: 1      min=1       max=1   
     cloud_default_play_dav_upload_trend..........: avg=22.78s   min=201.32ms med=10.07s   max=1m0s   p(90)=59.68s   p(95)=59.99s  
     data_received................................: 8.5 MB 11 kB/s
     data_sent....................................: 695 MB 876 kB/s
     http_req_blocked.............................: avg=186.83ms min=120ns    med=471ns    max=4.71s  p(90)=146.23ms p(95)=1.98s   
     http_req_connecting..........................: avg=92.35ms  min=0s       med=0s       max=1.26s  p(90)=33.14ms  p(95)=1s      
     http_req_duration............................: avg=32.16s   min=4.05ms   med=29.27s   max=1m0s   p(90)=59.99s   p(95)=1m0s    
       { expected_response:true }.................: avg=13.08s   min=4.05ms   med=4.95s    max=59.98s p(90)=39.98s   p(95)=53.8s   
     http_req_failed..............................: 65.12% ✓ 5909      ✗ 3165  
     http_req_receiving...........................: avg=1.86ms   min=0s       med=86.84µs  max=5.12s  p(90)=307.89µs p(95)=834.04µs
     http_req_sending.............................: avg=17.09ms  min=40.52µs  med=166.07µs max=6.06s  p(90)=465.99µs p(95)=14.12ms 
     http_req_tls_handshaking.....................: avg=94.37ms  min=0s       med=0s       max=4.71s  p(90)=83.5ms   p(95)=975.91ms
     http_req_waiting.............................: avg=32.14s   min=3.85ms   med=28.71s   max=1m0s   p(90)=59.99s   p(95)=1m0s    
     http_reqs....................................: 9074   11.430782/s
     iteration_duration...........................: avg=1m1s     min=1.19s    med=1m0s     max=2m13s  p(90)=1m0s     p(95)=1m1s    
     iterations...................................: 4149   5.226616/s
     vus..........................................: 0      min=0       max=1993
     vus_max......................................: 2000   min=139     max=2000

Ram usage stays below 800m on that 32gb machine. the go 1.20 garbage collector really helps. CPU peaks at 500% of avalable 800%. Lots of context canceled requests though. Well we have a request limit of 30 sec IIRC

@mmattel
Copy link
Contributor

mmattel commented Mar 16, 2023

how do we document:

With regards to the go version, pls add the info accodingly in this PR

In general:

  • For the admin docs, nothing needs to be done as no go compiler is necessary to run ocis.
  • To ask @wkloucek if there are any dependencies for Helm Charts.

For the new envvars:

OCIS_MEMORY_LIMIT_RATIO, OCIS_MEMORY_LIMIT_AMOUNT and OCIS_MEMORY_LIMIT_ENABLED

  • We need more content and notes for the impact.
    • I have not seen default values?
    • Is there a "auto" mode?
    • Can we use short forms like G, M etc
    • As there is no service, we need to improve the description at the envvars AND very likely add a section in General Information.
  • @kobergj will these envvars show up as global automatically (I do not believe so as it is not embedded in a service) or do we need to add them to Environment Variables with Special Scopes, tbd if Special or Extended.

@wkloucek
Copy link
Contributor

  • To ask @wkloucek if there are any dependencies for Helm Charts.

It will pick up the memory "limit" (note it is not a hard limit, it's a soft limit) automatically if the end users sets cgroups via https://github.com/owncloud/ocis-charts/blob/8d2666addbfa9adef233b6b53ad2ea5f97a980d2/charts/ocis/values.yaml#L358-L368

Maybe we'll expose a configuration option to tune the ratio of available memory and memory limit. But that can be done later.

@butonic
Copy link
Member Author

butonic commented Mar 17, 2023

@mmattel oops, OCIS_MEMORY_LIMIT_RATIO, OCIS_MEMORY_LIMIT_AMOUNT and OCIS_MEMORY_LIMIT_ENABLED were leftovers. They are actually no longer used. go has a GOMEMLIMIT env var, which is also used when running the compiled ocis server command. The library we add with this PR automatically sets it to 0.9, meaning 90% of free memory (or a configured cgroup). These two env vars are nowhere to be found in our code, I think it would be good enough to document that they exist and link to the official docs.

@mmattel
Copy link
Contributor

mmattel commented Mar 17, 2023

To make GOMEMLIMIT available for the standard doc process, create a new envvar like OCIS_GOMEMLIMIT in the same context like OCIS_BASE_DATA_PATH (it is a Extended Environment Variable). When done, we can use the default processes. I am not doing an extra doc process for one or two envvars...

@ownclouders
Copy link
Contributor

ownclouders commented Mar 17, 2023

💥 Acceptance test localApiTests-apiGraph-ocis failed. Further test are cancelled...

@kobergj
Copy link
Collaborator

kobergj commented Mar 17, 2023

@mmattel I don't think we need to document GOMEMLIMIT either. It is a standard golang envvar that is applicable to all go Applications. It is not an envvar we need to document or maintain. It could be deprecated or removed at any time, we have no control over it.

@mmattel
Copy link
Contributor

mmattel commented Mar 17, 2023

I don't think we need to document GOMEMLIMIT either

Just agreed with @butonic that this will not go into the admin docs but finds it way into the dev docs.

Signed-off-by: Jörn Friedrich Dreyer <[email protected]>
@butonic butonic marked this pull request as ready for review March 17, 2023 16:23
@butonic butonic requested a review from mmattel March 17, 2023 16:23
butonic added 4 commits March 17, 2023 17:53
Signed-off-by: Jörn Friedrich Dreyer <[email protected]>
Signed-off-by: Jörn Friedrich Dreyer <[email protected]>
Signed-off-by: Jörn Friedrich Dreyer <[email protected]>
Signed-off-by: Jörn Friedrich Dreyer <[email protected]>
@butonic
Copy link
Member Author

butonic commented Mar 20, 2023

@kobergj
Copy link
Collaborator

kobergj commented Mar 20, 2023

Can we exclude the issue from linting? I don't like merging red PRs

Copy link
Collaborator

@kobergj kobergj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changelog incorrect

changelog/unreleased/enhancement-memlimit.md Outdated Show resolved Hide resolved
@butonic
Copy link
Member Author

butonic commented Mar 20, 2023

Can we exclude the issue from linting? I don't like merging red PRs

no ... this has nothing to do with updating the go version. the related file was untouched ... AFAICT just sonarcloud randomly picking up files 🤷

Lets see if the new commit changes things...

Copy link
Collaborator

@kobergj kobergj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@sonarqubecloud
Copy link

SonarCloud Quality Gate failed.    Quality Gate failed

Bug C 1 Bug
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

100.0% 100.0% Coverage
0.0% 0.0% Duplication

@butonic butonic merged commit 474c4b8 into owncloud:master Mar 20, 2023
ownclouders pushed a commit that referenced this pull request Mar 20, 2023
* upgrade to go1.19 and set go mem limit

* create ocis-pkg memlimit package

* use std automemlimit import

Signed-off-by: Jörn Friedrich Dreyer <[email protected]>

* import automemlimit in every ocis service, drop ocis-pkg/memlimit package

Signed-off-by: Jörn Friedrich Dreyer <[email protected]>

* bump go to 1.20

Signed-off-by: Jörn Friedrich Dreyer <[email protected]>

* drop unused config options and env vars

Signed-off-by: Jörn Friedrich Dreyer <[email protected]>

* update all version numbers, add doc

Signed-off-by: Jörn Friedrich Dreyer <[email protected]>

* fix lint

Signed-off-by: Jörn Friedrich Dreyer <[email protected]>

* update bingo and mockery

Signed-off-by: Jörn Friedrich Dreyer <[email protected]>

* bump golangci-lint

Signed-off-by: Jörn Friedrich Dreyer <[email protected]>

* fix selector test

Signed-off-by: Jörn Friedrich Dreyer <[email protected]>

* Update changelog/unreleased/enhancement-memlimit.md

Co-authored-by: kobergj <[email protected]>

---------

Signed-off-by: Jörn Friedrich Dreyer <[email protected]>
Co-authored-by: Willy Kloucek <[email protected]>
Co-authored-by: kobergj <[email protected]>
@micbar micbar mentioned this pull request May 3, 2023
89 tasks
@ScharfViktor ScharfViktor mentioned this pull request May 4, 2023
86 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants