Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory leak when running 2 atmosphere instances in one VM #43

Closed
cproinger opened this issue Oct 23, 2011 · 15 comments
Closed

memory leak when running 2 atmosphere instances in one VM #43

cproinger opened this issue Oct 23, 2011 · 15 comments
Assignees

Comments

@cproinger
Copy link

hi jeanfrancois.

i use atmosphere 0.7.2 (i tried 0.8.0-RC2 but with this version something goes wrong, but i couldn't quite make out what exactly "The current Broadcaster has been destroyed and cannot be re-used. Recreating a new one with the same name. You can turn off that mechanism by adding, in web.xml, org.atmosphere.cpr.recoverFromDestroyedBroadcaster set to false" => turned it off, doesn't work either), jersey 1.9.1 and jetty 7.5.1.v20110908 (Jetty7CometSupport).

heres my scenario. i'm performing benchmark tests with 2 programmatically configured and bootstrapped jetty instances with jersey and atmosphere. (every instance-specific objects i bind in the servletContext) one instance sequentially calls the other using AsyncHTTPClient invoking a atmosphere annotaded method like this.

    @Path("/read")
    @POST
    @Suspend(scope = SCOPE.REQUEST, resumeOnBroadcast = true, outputComments = true, listeners = Listener.class)
    public Broadcastable readEntries(
            ReadEntriesRequest<Serializable> areq, 
            @HeaderParam(ResourceConstants.X_REQUEST_ID) final Broadcaster bc) {
            //first i put my the broadcaster together with a latch in a map so my worker and AtmosphereResourceListener can
            //synchronize with it. 
            lachtes.put(bc, countDownLatch);
            //.. here a thread is started that waits on the latch until the listeners onSuspend-Method is called
            // and then calls bc.broadcast(data);
            return new Broadcastable(null, bc);
    }

    //Listener-Class

    @Override
    public void onSuspend(
            AtmosphereResourceEvent<HttpServletRequest, HttpServletResponse> event) {
        //log.info("on suspend: " + event);
        Broadcaster bc = event.getResource().getBroadcaster();
        CountDownLatch l = latches.get(bc);
        l.countDown();
        latches.remove(bc);
    }

now my tests make ~200.000 requests like this sequentially. after some time theres no progess anymore. The GC is running all the time but cannot free any memory (i guess it's not throwing a OOM because my client stops requesting too at that time). so i made a heapdump and i saw that there are many JerseyBroadcasters still in Memory (20.091). (when i set Xmx to a high enough level my test completes but still the memory consumption raises the whole time when someone would expect it to be pretty constant since it's the same request over and over again)

i learned the hard way that atmosphere doesn't destroy those by default but i configured the "org.atmosphere.cpr.broadcasterLifeCyclePolicy" to be EMPTY_DESTROY (before that i ran out of memory much ealier). the heapdump now tells me (i analyzed it with the eclipse MemoryAnalyzerTool. i don't do that often so i may misinterpret this) that those Broadcasters are still in the DefaultBroadcasterFactory's ConcurrentHashMap (store).

i stored the heapdump. zipped it's 30 MB. if you could take a look at it i would really appreciate it. how can i make it available to you? should i create a git-repo for it or do you know an easier way?

greets christian

@jfarcand
Copy link
Member

Salut,

I need information on (2) things. The first is why RC2 is failling (I'm about to release it today). The message you are seeing just help when a suspended connection gets it's associated Broadcaster destroyed. In that case the suspended connection is still alive and if a broadcast occurs, without that property set the message will be lost. So what's the exact issue you are seeing?

For (2), anywhere I can download it will be fine...but the number of Broadcaster is for sure the issue...but are you resuming connection? It's one broadcaster per connection here (scope == request) so you may just hit the machine limits. Note that Jetty isn't cleaning client disconnection so that may also be an issue (you need to use the maxInnactive's property for reclaiming dead connection.

I will delay the release 24 hours as I want to make sure everything is OK first.

Thanks!

-- Jeanfrancois

@ghost ghost assigned jfarcand Oct 23, 2011
@cproinger
Copy link
Author

heres my stacktrace with 0.8.0-RC2

11:46:45.198 [Atmosphere-AsyncWrite-0] WARN o.a.cpr.AtmosphereResourceImpl - Broadcaster http://localhost:9880/requests/12114 has been destroyed and cannot be re-used. Recreating a new one with the same name. You can turn off that mechanism by adding, in web.xml, org.atmosphere.cpr.recoverFromDestroyedBroadcaster set to false
11:46:45.206 [qtp1757448463-62] ERROR o.a.h.ReflectorServletProcessor - onRequest()
java.lang.IllegalStateException: State==HEADER
at org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:822) ~[jetty-http-7.5.1.v20110908.jar:7.5.1.v20110908]
at org.eclipse.jetty.http.AbstractGenerator.blockForOutput(AbstractGenerator.java:536) ~[jetty-http-7.5.1.v20110908.jar:7.5.1.v20110908]
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:177) ~[jetty-server-7.5.1.v20110908.jar:7.5.1.v20110908]
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:96) ~[jetty-server-7.5.1.v20110908.jar:7.5.1.v20110908]
at com.sun.jersey.spi.container.servlet.WebComponent$Writer.write(WebComponent.java:307) ~[jersey-server-1.9.1.jar:1.9.1]
at com.sun.jersey.spi.container.ContainerResponse$CommittingOutputStream.write(ContainerResponse.java:134) ~[jersey-server-1.9.1.jar:1.9.1]
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:202) ~[na:1.6.0_23-ea]
at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:272) ~[na:1.6.0_23-ea]
at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:276) ~[na:1.6.0_23-ea]
at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:122) ~[na:1.6.0_23-ea]
at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:212) ~[na:1.6.0_23-ea]
at org.codehaus.jettison.mapped.MappedXMLStreamWriter.writeEndDocument(MappedXMLStreamWriter.java:246) ~[jettison-1.2.jar:1.2]
at com.thoughtworks.xstream.io.xml.StaxWriter.endNode(StaxWriter.java:109) ~[xstream-1.3.1.jar:na]
at com.thoughtworks.xstream.core.TreeMarshaller.start(TreeMarshaller.java:99) ~[xstream-1.3.1.jar:na]
at com.thoughtworks.xstream.core.AbstractTreeMarshallingStrategy.marshal(AbstractTreeMarshallingStrategy.java:38) ~[xstream-1.3.1.jar:na]
at com.thoughtworks.xstream.XStream.marshal(XStream.java:837) ~[xstream-1.3.1.jar:na]
at com.thoughtworks.xstream.XStream.marshal(XStream.java:826) ~[xstream-1.3.1.jar:na]
at com.thoughtworks.xstream.XStream.toXML(XStream.java:801) ~[xstream-1.3.1.jar:na]
at org.mozartspaces.web.XStreamJsonProvider.writeTo(XStreamJsonProvider.java:152) ~[classes/:na]
at com.sun.jersey.spi.container.ContainerResponse.write(ContainerResponse.java:306) ~[jersey-server-1.9.1.jar:1.9.1]
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1437) ~[jersey-server-1.9.1.jar:1.9.1]
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349) ~[jersey-server-1.9.1.jar:1.9.1]
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339) ~[jersey-server-1.9.1.jar:1.9.1]
at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416) ~[jersey-server-1.9.1.jar:1.9.1]
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537) ~[jersey-server-1.9.1.jar:1.9.1]
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:708) ~[jersey-server-1.9.1.jar:1.9.1]
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) [servlet-api-2.5.jar:2.5]
at org.atmosphere.util.AtmosphereFilterChain.doFilter(AtmosphereFilterChain.java:155) ~[atmosphere-runtime-0.8.0-RC2-SNAPSHOT.jar:0.8.0-RC2-SNAPSHOT]
at org.atmosphere.util.AtmosphereFilterChain.invokeFilterChain(AtmosphereFilterChain.java:116) ~[atmosphere-runtime-0.8.0-RC2-SNAPSHOT.jar:0.8.0-RC2-SNAPSHOT]
at org.atmosphere.handler.ReflectorServletProcessor$FilterChainServletWrapper.service(ReflectorServletProcessor.java:293) ~[atmosphere-runtime-0.8.0-RC2-SNAPSHOT.jar:0.8.0-RC2-SNAPSHOT]
at org.atmosphere.handler.ReflectorServletProcessor.onRequest(ReflectorServletProcessor.java:151) ~[atmosphere-runtime-0.8.0-RC2-SNAPSHOT.jar:0.8.0-RC2-SNAPSHOT]
at org.atmosphere.cpr.AsynchronousProcessor.action(AsynchronousProcessor.java:204) [atmosphere-runtime-0.8.0-RC2-SNAPSHOT.jar:0.8.0-RC2-SNAPSHOT]
at org.atmosphere.cpr.AsynchronousProcessor.suspended(AsynchronousProcessor.java:147) [atmosphere-runtime-0.8.0-RC2-SNAPSHOT.jar:0.8.0-RC2-SNAPSHOT]
at org.atmosphere.container.Jetty7CometSupport.service(Jetty7CometSupport.java:82) [atmosphere-runtime-0.8.0-RC2-SNAPSHOT.jar:0.8.0-RC2-SNAPSHOT]
at org.atmosphere.cpr.AtmosphereServlet.doCometSupport(AtmosphereServlet.java:1166) [atmosphere-runtime-0.8.0-RC2-SNAPSHOT.jar:0.8.0-RC2-SNAPSHOT]
at org.atmosphere.cpr.AtmosphereServlet.doPost(AtmosphereServlet.java:1134) [atmosphere-runtime-0.8.0-RC2-SNAPSHOT.jar:0.8.0-RC2-SNAPSHOT]
at javax.servlet.http.HttpServlet.service(HttpServlet.java:727) [servlet-api-2.5.jar:2.5]
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) [servlet-api-2.5.jar:2.5]
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:547) [jetty-servlet-7.5.1.v20110908.jar:7.5.1.v20110908]
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:481) [jetty-servlet-7.5.1.v20110908.jar:7.5.1.v20110908]
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:225) [jetty-server-7.5.1.v20110908.jar:7.5.1.v20110908]
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:940) [jetty-server-7.5.1.v20110908.jar:7.5.1.v20110908]
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:409) [jetty-servlet-7.5.1.v20110908.jar:7.5.1.v20110908]
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:186) [jetty-server-7.5.1.v20110908.jar:7.5.1.v20110908]
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:874) [jetty-server-7.5.1.v20110908.jar:7.5.1.v20110908]
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) [jetty-server-7.5.1.v20110908.jar:7.5.1.v20110908]
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:110) [jetty-server-7.5.1.v20110908.jar:7.5.1.v20110908]
at org.eclipse.jetty.server.Server.handle(Server.java:349) [jetty-server-7.5.1.v20110908.jar:7.5.1.v20110908]
at org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:441) [jetty-server-7.5.1.v20110908.jar:7.5.1.v20110908]
at org.eclipse.jetty.server.HttpConnection$RequestHandler.content(HttpConnection.java:921) [jetty-server-7.5.1.v20110908.jar:7.5.1.v20110908]
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:784) [jetty-http-7.5.1.v20110908.jar:7.5.1.v20110908]
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:223) [jetty-http-7.5.1.v20110908.jar:7.5.1.v20110908]
at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:46) [jetty-server-7.5.1.v20110908.jar:7.5.1.v20110908]
at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:545) [jetty-io-7.5.1.v20110908.jar:7.5.1.v20110908]
at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:43) [jetty-io-7.5.1.v20110908.jar:7.5.1.v20110908]
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:598) [jetty-util-7.5.1.v20110908.jar:7.5.1.v20110908]
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:533) [jetty-util-7.5.1.v20110908.jar:7.5.1.v20110908]
at java.lang.Thread.run(Thread.java:619) [na:1.6.0_23-ea]

@cproinger
Copy link
Author

ad ".but are you resuming connection?". shoudln't the @Suspend-Attribute resumeOnBroadcast = true resume the connection? the client shouldn't disconnect the requests-timeout is set to infinity. i will try to provide a code example and the heapdump this evening (8-10 hours from now).

@jfarcand
Copy link
Member

OK I will try to reproduce that issue locally. Thanks for sharing!!

@jfarcand
Copy link
Member

FYI, I've added a similar test case

Client
Resource

So far no leak and no exception. Still working on it...

@cproinger
Copy link
Author

i wrote a testcase

https://github.com/cproinger/atmosphere/tree/master/samples/jetty-jersey-rest

for 0.7.2:
tough it takes significantly longer than with my application until it runs out of memory (actually it didn't run out of memory by now but converges the maxHeapSize since 20 minutes, that alone wouldn't concerne me but the GC-Activity is high for last 40 min).

the heap-dump is in the root of this project

for 0.8.0-RC2:
the test gets stuck however, the only odd thing i could observer was this warning

2011-10-24 22:28:19.021:WARN:oejs.HttpConnection:header full: java.lang.NullPointerException
on disconnect: AtmosphereResourceEventImpl{isResuming=false isCancelled=true, isResumedOnTimeout=false, message=null, resource=AtmosphereResourceImpl{, action=org.atmosphere.cpr.AtmosphereServlet$Action@6e75d758, broadcaster=org.atmosphere.jersey.JerseyBroadcaster, cometSupport=org.atmosphere.container.Jetty7CometSupport@7a8d0b0f, serializer=null, isInScope=true, useWriter=true, listeners=[com.cp.test.SomeResource$Listener@41d144a4]}, throwable=null}

@jfarcand
Copy link
Member

I cannot help for 0.7.2 as I'm trying to release 0.8 RC2. Are you using the latest SNAPSHOT? Do you think you can grab a SNAPSHOT? I can also reproduce the header NPE but that shoudn't hangs the process. Thanks for the help!

@jfarcand
Copy link
Member

The NPE seems to be from Jetty

https://gist.github.com/1310124

will add a workaround.

@cproinger
Copy link
Author

forked your repository this morning (10 hours ago), the test-case should use that version

@jfarcand
Copy link
Member

Ok the fix went in after. Sorry can you try with the latest?

@cproinger
Copy link
Author

  1. i ran the test case for 0.7.2. again and after 56 minutes it gives me a OOM.
  2. i pulled your changes and tried to install them locally. the atmosphere-jersey-module gives me 2 test-failures:

Test set: TestSuite

Tests run: 71, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 389.04 sec <<< FAILURE!
testConcurrentAndIdleResumePolicy(org.atmosphere.jersey.tests.ConcurrentResourceTest) Time elapsed: 60.174 sec <<< FAILURE!
org.testng.internal.thread.ThreadTimeoutException: Method public void org.atmosphere.jersey.tests.ConcurrentResourceTest.testConcurrentAndIdleResumePolicy() didn't finish within the time-out 60000
at org.testng.internal.MethodHelper.invokeWithTimeout(MethodHelper.java:735)
at org.testng.internal.Invoker.invokeMethod(Invoker.java:490)
at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:617)
at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:885)
at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:126)
at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:110)
at org.testng.TestRunner.runWorkers(TestRunner.java:712)
at org.testng.TestRunner.privateRun(TestRunner.java:582)
at org.testng.TestRunner.run(TestRunner.java:477)
at org.testng.SuiteRunner.runTest(SuiteRunner.java:324)
at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:319)
at org.testng.SuiteRunner.privateRun(SuiteRunner.java:292)
at org.testng.SuiteRunner.run(SuiteRunner.java:198)
at org.testng.TestNG.createAndRunSuiteRunners(TestNG.java:821)
at org.testng.TestNG.runSuitesLocally(TestNG.java:788)
at org.testng.TestNG.run(TestNG.java:708)
at org.apache.maven.surefire.testng.TestNGExecutor.run(TestNGExecutor.java:70)
at org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.executeMulti(TestNGDirectoryTestSuite.java:157)
at org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.execute(TestNGDirectoryTestSuite.java:97)
at org.apache.maven.surefire.testng.TestNGProvider.invoke(TestNGProvider.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:164)
at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:110)
at org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:172)
at org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcessWhenForked(SurefireStarter.java:104)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:70)
testConcurrentAndEmptyDestroyPolicy(org.atmosphere.jersey.tests.ConcurrentResourceTest) Time elapsed: 1.405 sec <<< FAILURE!
java.lang.AssertionError: expected: but was:
at org.testng.Assert.fail(Assert.java:84)
at org.testng.Assert.failNotEquals(Assert.java:438)
at org.testng.Assert.assertEquals(Assert.java:108)
at org.testng.Assert.assertEquals(Assert.java:129)
at org.testng.Assert.assertEquals(Assert.java:139)
at org.atmosphere.jersey.tests.ConcurrentResourceTest.testConcurrentAndEmptyDestroyPolicy(ConcurrentResourceTest.java:97)

@jfarcand
Copy link
Member

Ya, I need to fix the test. That test will pass if you run it standalone. BTW I've released 0.8.0-RC2. Let's see if that help. Anyway I will make sure this issue is fixed before making the final release. Thanks!

@cproinger
Copy link
Author

i built RC2 again skipping the tests and ran my testcase again with this version. the tests got stuck (0% CPU, no requets processed) again at ~12000 requests. so i looked at the threaddump to see if there were threads blocked or something. the funny thing was, after some time (i forgot to kill the process) requests got processed again and after some time it got stuck again, and so on and so on with ~4 minute pauses with no CPU Activity, in this time memory got allocated and garbage collected though.

@jfarcand
Copy link
Member

Salut...well I think I will need a test case from you and you you stress it. I apologize I fail to see what can cause such issue with a sample I've written. Thanks for your patience

@jfarcand
Copy link
Member

OK, I think the fix went in this morning (I was able to reproduce an OOM with SCOPE == REQUEST) and without any BroadcasterLifeCycle policy. In both branches. Closing, re-open if you still see it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants