-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High CPU use due to new thread created on every websocket message in DispatchedMessageSink #6566
Comments
Well, the high CPU issue is back for me even with the thread pool modification, so this may not be a Jetty issue although I still question why a new thread is created in the class above instead of a thread pool. |
If you are using java.io Stream based reads (eg: InputStream, Reader) then per spec we are required to spawn a new thread for each incoming message. That style of read is for VERY long lived (and undetermined length) messsages, think gaming, or video chat. It is inappropriate to use for short messages (how short is short? well, that depends on a huge number of factors, from your hardware, memory, OS, network, concurrent connections, level of load, etc. But anything that can be sent in under 5 minutes is a metric i personally use). |
Thank you so very much for taking the time to respond and explain, really. I had no idea WebSocket.onMessage(InputStream) vs non-stream methods made such a difference in implementation. I am now using onMessage(ByteBuffer) and this has solved the problem of creating a new thread for every message. I am still, unfortunately, having the issue of very high CPU since switching from Jetty 9 (15% cpu) to Jetty 11 (60% cpu), so the hunt continues, but presumably not related to this topic. Thanks again. |
Was the only change jetty 9 -> 11? |
The change was jetty 9->11 but also from not embedded to embedded. I need to have embedded so now in the process of downgrading from 11->9 embedded. At this point, trying to maintain the performance of 9.4 just changing to embedded. |
Also, thank you for the tips in dumping thread info! That's gold. |
The only change made now, is from Jetty 11 embedded to 9.4 embedded, and CPU went from 60% to 15%. Something going on, I no idea what. Here is the server startup code this is used: Server server = new Server(new QueuedThreadPool(500,30,99999));
HttpConfiguration httpConfig = new HttpConfiguration();
httpConfig.setIdleTimeout((Command.clientDataSyncIntervalSeconds + 30) * 1000);
httpConfig.setSendServerVersion(false);
httpConfig.setSendServerVersion(false);
httpConfig.setPersistentConnectionsEnabled(true);
HttpConnectionFactory http11 = new HttpConnectionFactory(httpConfig);
//supports translating client IP address when behind load balancers
ProxyConnectionFactory proxy = new ProxyConnectionFactory(http11.getProtocol());
ServerConnector connector = new ServerConnector(server, proxy, http11);
connector.setPort(8080);
server.addConnector(connector);
var servletHandler = new ServletContextHandler(server,"/");
servletHandler.setBaseResource(Resource.newClassPathResource("/web"));
//JettyWebSocketServletContainerInitializer.configure(servletHandler, null);
servletHandler.addServlet(MainServlet.class, "").setAsyncSupported(true);
servletHandler.addServlet(LoginServlet.class, "/login");
servletHandler.addServlet(StatisticsServlet.class, "/stats");
servletHandler.addServlet(BlockedServlet.class, "/blocked");
servletHandler.addServlet(AdminServlet.class, "/admin");
servletHandler.addServlet(PingServlet.class, "/ping");
servletHandler.addServlet(MapServlet.class, "/map");
servletHandler.addServlet(ConsoleLogServlet.class, "/log");
servletHandler.addServlet(DefaultServlet.class,"/");
servletHandler.addFilter(RedirectFilter.class, "/*", EnumSet.of(DispatcherType.REQUEST)).setAsyncSupported(true);
FilterHolder filterJavaMelody = servletHandler.addFilter(MonitoringFilter.class, "/*", EnumSet.of(DispatcherType.REQUEST));
filterJavaMelody.setAsyncSupported(true);
filterJavaMelody.setInitParameter("system-actions-enabled","false");
filterJavaMelody.setInitParameter("disable","false");
filterJavaMelody.setInitParameter("gzip-compression-disabled","true");
filterJavaMelody.setInitParameter("no-database","true");
filterJavaMelody.setInitParameter("displayed-counters","http,error,log");
// Link the context to the server.
GzipHandler gzipHandler = new GzipHandler();
gzipHandler.setMinGzipSize(1024);
gzipHandler.addIncludedMethods("GET");
gzipHandler.addIncludedMethods("POST");
gzipHandler.setHandler(servletHandler);
server.setHandler(gzipHandler);
initServices();
server.start(); |
OS: | Linux, 4.14.238-182.422.amzn2.aarch64 (2 cores) -XX:-OmitStackTraceInFastThrow |
@nezoat can you also report the heap size? In the graphs above, at what time did you switch to Jetty 9.4.x? |
Just some minor things noticed in your code (which has nothing to do with your issue) ...
Did you mean to set that twice?
This is true by default.
I mention this detail, as it's confused people on stackoverflow for years, thinking that it forces persistent connections. It merely adds a header on the response to indicate the server's desire for persistence, which the client and/or application can override easily.
Sounds like you really want the |
Heap is 9G. The switch from 11-->9.4.x on the CPU graph is around 18:20, and on the GC graph its just past 22:00 hrs. |
Not sure I understand. Are you saying that the switch happened at 18:20 on the CPU graph, but at 22:00 on the GC graph? What load are you running? We don't see this high CPU in our load tests. What's your application doing? |
@joakime Thank you very much, yes I think ForwardedRequestCustomizer() is what I was looking for. My server code is still checking X-FORWARDED-FOR headers if present so I had not yet followed up to see if the Proxy code was handling this for me, but it appears ForwardedRequestCustomizer is what I actually need. I've made the other code adjustments based on your comments as well. |
@sbordet. Sorry for the confusion, I should have clarified the graphs are from different apps but they are of the same run, correlation of the CPU dropping and GC going to nearly zero is still correct. The CPU graphs is from an Amazon EC2 CloudWatch dashboard and is my local time, and it capturing aggregate data of all three servers in a cluster. The green GC chart is from JavaMelody and is in UTC-0 time, of only one server directly. 1800 on the CPU chart is same as 2200 on the GC chart. I made the downgrade from 11->9 on each server, which takes about 10 minutes for the reboot on each so the CPU graph has a couple bumps from reboots before it settles down to a smooth 15% avg all servers. I should also note that I tried changing the GC from ZGC to Shenandoah and still have high CPU issues with Jetty 11. |
@nezoat did you actually do the required code change when going from Jetty 9.4.x to Jetty 11.x? (due to the Jakarta "Big Bang") |
The app handles the server side of a chess app, syncing moves between players, profile data, lists of games, etc. Messages are all simple JSON requests (HTTP POST for non websocket clients, websocket all others). The load in the graphs are from the live production servers. Each of the three servers handles around 2500 requests per minute. Yes, the GC load increases over time but that's partly due to the daily cycle of load increases due to peak time of day. The CPU load was fine with Jetty 11 during low traffic, but seems to get exponentially worse as traffic picked up and then level off to a steady state around 60%. With Jetty 9, GC stays below 2% all the time. Just to let you know, I am ok with staying on Jetty 9 for a couple more years, I have no real reason to go to Jetty 11 right now so no need to solve this problem for me, but it does seem to be some kind of issue between GC and Jetty 11. Maybe related to the reduced thread stack I'm using , 256K, or the ARM64 based Linux on the Amazon servers. Maybe an edge case that isn't showing up on your normal load tests. |
And this is the basics of the MainServlet where the WebSocket upgrade occurs: public class MainServlet extends WebSocketServlet { @OverRide @websocket
} // end servlet |
Yes, all my servlets were updated with base class and method parameters due to the Jakarta namespace change for Jetty 11, and then switched back to javax for Jetty 9.4. |
@nezoat we need more information. |
I'll see if I can get a simple client/server example setup with load to reproduce it. |
…d Messages Signed-off-by: Lachlan Roberts <[email protected]>
Signed-off-by: Lachlan Roberts <[email protected]>
Not a perfect test, but the attached is three projects, a client, and a jetty9 server and jetty11 server. The servers just echo the data back to the client. Client starts a couple threads and loops 50k random bytes. I ran it long enough to see that the issue does still show up in this test, most noticeably in the GC. CPU a small amount but these charts had the client and server running on the same machine, so much of the cpu is due to the client. If you haven't seen any problems in your load tests, I'm guess the issue is related to ZGC and/or Amazon's ARM Linux. This is tested on an Amazon Linux arm64 r6g.large. Nothing else on the AMI or changes other than increased open files limits, shouldn't affect anything here. java -jar -XX:+UseZGC -XX:ThreadStackSize=256k jetty-test-server9.jar Jetty11 test was run from about 1800->1900, and Jetty 9 immediate after for the rest of the run (about 1900->2000) Also, this is OpenJDK 15.0.1+9-18, 64bit. |
…utor Issue #6566 - add executor to WebSocketComponents & use for Dispatched Messages
Reopening as github closed this issue due to the merge of the speculative PR #6585 |
@lachlan-roberts for "large messages", say 50000 bytes as in this issue, in both Jetty 9.4 and 10 a 4k network buffer is used. In Jetty 10 with
With
So there is one more copy when I think the only necessary copy is the one the implementation must do before calling the application; this is necessary to avoid that the application retains the In the In the best case we should do zero copies: read from the network, get the frame with a slice of the payload (or a synthetic frame), call the application when the callback of the form We need 2 copies if we want to guarantee that the application does not mess with our buffers that we may pool/recycle, so we do a second, defensive copy, passing to the application a throw-away |
Definitely let's look at reducing copying.... but also 4K is either a massively too large buffer for many ws messages.... or an entirely too small one for many others. Can we be a bit more dynamic in choosing the size? or at least configurable? |
…with their callbacks Signed-off-by: Lachlan Roberts <[email protected]>
Signed-off-by: Lachlan Roberts <[email protected]>
…amMessageSinkTest failures Signed-off-by: Lachlan Roberts <[email protected]>
Signed-off-by: Lachlan Roberts <[email protected]>
Signed-off-by: Lachlan Roberts <[email protected]>
Signed-off-by: Lachlan Roberts <[email protected]>
…ageSinks Issue #6566 - use the demand API for the websocket MessageSinks
By using the websocket-core demand API in the @nezoat if you can try out the current 11.0.x branch to see if that resolves your issue then that would be great. Otherwise you could wait to try it out when 10.0.7/11.0.7 are released. |
FYI, Jetty 10.0.7 and 11.0.7 were released with this fix. |
In websocket-jetty-server 11.0.6, the class "org.eclipse.jetty.websocket.core.internal.messages.DispatchedMessageSink" accept method starts a new Thread (NOT using a thread pool). In my app this is causing a new thread creation for every web socket message (not connection). As a result, I am seeing about 4x total app CPU usage. As a workaround, I updated the code to use a Thread Pool and now see low CPU usage again.
The text was updated successfully, but these errors were encountered: