-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Profile Che-Theia to determine the correct memory/cpu resources #18565
Comments
After the detailed investigation, we figured out that Che-Theia's @vzhukovs please provide more details on your investigation results. |
Here are some investigations around memory and cpu configuration: In all workspace configurations were used workspace based on eclipse/che-devfile-registry/devfiles/java-mysql/devfile.yaml Metrics gathered from Prometheus & Grafana. Measurement 1 (with default configuration):
Here we can observe general resource usage for the whole workspace: Here we can observe resource usage for Theia container: At workspace startup we can see, that Theia sometimes goes into offline mode. Measurement 2:
Here we can observe general resource usage for the whole workspace: Here we can observe resource usage for Theia container: The same situation with the offline mode as in the first measurement. Memory request doesn't influence on web socket connection. Measurement 3:
Here we can observe resource usage for the whole workspace:
Here we can observe how Theia container throttles at workspace startup: Theia starts really slow. Throttling influences on web socket connection by increasing the time on sending/receiving web socket messages through the channel, so we can get offline message more than usual. Measurement 4:
Here we can see, that setting up CPU Limit to 0.5 influences the Theia container:
Measurement 5:
The same situation with throttling as we have in Measurement 4: Measurement 6:
Workspace wasn't started with such configuration. Got error:
During the measurement the ping request was measured.
As we can see, tuning Digging into connection status service revealed the following problem. There is a web socket activity handler which sets up a timer [1]. When first message comes in the channel function after 4 second calls the ping and if ping successful, connection status server sets up another timer [2] for 5 second to trigger connection to offline. In the meantime we receive another web socket message and connection status sets up timer for another ping request and this ping request might be slow (over 1 second). During this one second timer [2] triggers and Theia goes to offline mode. After that we finally receive response from second iteration from timer [1] and Theia immediately switches to online mode. So the problem is in two timers that doesn't track activity for promise. This issue can be reproduced on vanilla Theia when user tries to open large file in the editor (> 5-7mb, it depends on user's host). In this case websocket channel is busy by transmitting the file content and ping service can't operate well. There are two possible solutions:
|
Thank you @vzhukovs for the great benchmarking! So, setting cpuLimit to any of 100m, 500m, or 900m slows down Che-Theia significantly. |
As we've figured out, Offline mode issue cannot be fixed by tuning the cpu/mem resources,
I believe a better option would be to introduce Theia configuration parameter upstream and just set it in Che-Theia, e.g. here. It would give us more flexibility. |
|
What I observe is that:
kubectl get po -l 'che.original_name=workspace' -o json -w |
jq -r 'if .spec then .spec.containers[] |
"---", .name, .resources.requests.cpu, .resources.limits.cpu
else "no pod yet" end' |
I've tried to apply different configurations to Theia container with mixed outcomes: But visually, setting up the cpuLimit to 1.5 causes to the case, that it somehow influences to the web socket communication and Theia continues to show offline mode. We can setup by default cpuLimit to 1.5 and cpuRequest to 0.5-0.75, but this won't get proper feedback on web socket communication channel. Communication status checker mechanism should be reviewed in upstream. Setting cpuLimit or cpuRequest more than 2 I suppose won't be a good idea, because usually Che is starting on default minikube configuration (if developer starts it on host machine) and default configuration tries to allocate cpuLimit to 2 for the whole local cluster. This might be also somehow related with the amount of physical cores, but not sure about it.
I tried the provided command and I saw no difference in output from this command and what grafana shows on charts in realtime. |
@vzhukovs what's your conclusion? What are the CPU request and limit values for Che Theia? |
@l0rd sorry, for the late response. From what I've got, using the local installation and che.openshift.io, configuration with |
@vzhukovs please provide a PR for setting the resources for Che-Theia based on your investigations within this issue. |
The |
Is your task related to a problem? Please describe.
When starting a Workspace, Che-Theia shows it’s in Offline mode.
Offline indicator is displayed for a short period of time, but it makes a bad UX.
Offline indicator means that Che-Theia backend didn’t respond to a ping request within the timeout.
Most likely, it’s because we don’t have the
memoryRequest
/cpuLimit
specified for Che-Theia sidecar.Describe the solution you'd like
We need to profile Che-Theia to determine the correct
memoryRequest
/cpuLimit
values to set for Che-Theia sidecar.Describe alternatives you've considered
Additional context
kubectl top
can be used for profiling a sidecar.cpuRequest
andmemoryRequest
to the che-theia plugin meta.yaml che-plugin-registry#716The text was updated successfully, but these errors were encountered: