Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaling Down Causes "Down" Replicas #754

Open
Kamalsaiperla opened this issue Jan 30, 2025 · 0 comments
Open

Scaling Down Causes "Down" Replicas #754

Kamalsaiperla opened this issue Jan 30, 2025 · 0 comments

Comments

@Kamalsaiperla
Copy link

Kamalsaiperla commented Jan 30, 2025

Environment
Solr Operator Version: 0.8.1 → 0.9.0 (same issue)
Solr Image Version: 9.6.1
Platform: GKE
Custom Plugins: Yes
HPA Configuration: Configured for CPU-based scaling

Issue Description
When scaling up (averageUtilization=10%), Solr pods successfully scale to the maxReplicas (10) without issues. However, when scaling down (averageUtilization=80%), Solr does not reduce the number of pods, and several shards show "Down" replicas.

Steps to Reproduce
Deploy Solr Operator (0.8.1, later tested with 0.9.0) with Solr 9.6.1.
Configure an HPA with CPU-based scaling.
Create collections and insert documents.
Test 1: Decrease averageUtilization to 10% → Pods scale up to 10 (expected behavior).
Test 2: Increase averageUtilization to 80% → Pods do not scale down, and some shards show "Down" replicas.

Expected Behavior
When increasing averageUtilization, pods should scale down as per HPA settings.
Shards should not end up in "Down" state.

Observed Behavior
Pods remain at max (10).
Some shards have "Down" replicas.

Additional Information
Upgrading the Solr Operator from 0.8.1 to 0.9.0 did not resolve the issue.

Screenshots

Image Image Image Image Image

Logs:
2025-01-30 16:58:24.643 ERROR (qtp1155769010-5575-search-solrcloud-4.csr-58880) [c:l5RecommendationCollection s:shard2 r:core_node502 x:l5RecommendationCollection_shard2_replica_n501 t:search-solrcloud-4.csr-58880] o.a.s.u.UpdateLog Exception reading versions from log => java.io.EOFException at org.apache.solr.common.util.FastInputStream.readUnsignedByte(FastInputStream.java:79) java.io.EOFException: null at org.apache.solr.common.util.FastInputStream.readUnsignedByte(FastInputStream.java:79) ~[?:?] at org.apache.solr.common.util.FastInputStream.readInt(FastInputStream.java:239) ~[?:?] at org.apache.solr.update.TransactionLog$FSReverseReader.<init>(TransactionLog.java:889) ~[?:?] at org.apache.solr.update.TransactionLog.getReverseReader(TransactionLog.java:705) ~[?:?] at org.apache.solr.update.UpdateLog$RecentUpdates.update(UpdateLog.java:1613) ~[?:?] at org.apache.solr.update.UpdateLog$RecentUpdates.<init>(UpdateLog.java:1528) ~[?:?] at org.apache.solr.update.UpdateLog.getRecentUpdates(UpdateLog.java:1727) ~[?:?] at org.apache.solr.handler.component.RealTimeGetComponent.processGetVersions(RealTimeGetComponent.java:1262) ~[?:?] at org.apache.solr.handler.component.RealTimeGetComponent.process(RealTimeGetComponent.java:161) ~[?:?] at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:465) ~[?:?] at org.apache.solr.handler.RealTimeGetHandler.handleRequestBody(RealTimeGetHandler.java:43) ~[?:?] at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:226) ~[?:?] at org.apache.solr.core.SolrCore.execute(SolrCore.java:2886) ~[?:?] at org.apache.solr.servlet.HttpSolrCall.executeCoreRequest(HttpSolrCall.java:910) ~[?:?] at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:596) ~[?:?] at org.apache.solr.servlet.SolrDispatchFilter.dispatch(SolrDispatchFilter.java:262) ~[?:?] at org.apache.solr.servlet.SolrDispatchFilter.lambda$doFilter$0(SolrDispatchFilter.java:219) ~[?:?] at org.apache.solr.servlet.ServletUtils.traceHttpRequestExecution2(ServletUtils.java:249) ~[?:?] at org.apache.solr.servlet.ServletUtils.rateLimitRequest(ServletUtils.java:215) ~[?:?] at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:213) ~[?:?] at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) ~[?:?] at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:210) ~[jetty-servlet-10.0.20.jar:10.0.20] at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1635) ~[jetty-servlet-10.0.20.jar:10.0.20] at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:527) ~[jetty-servlet-10.0.20.jar:10.0.20] at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:131) ~[jetty-server-10.0.20.jar:10.0.20] at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:598) ~[jetty-security-10.0.20.jar:10.0.20] at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122) ~[jetty-server-10.0.20.jar:10.0.20] at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:223) ~[jetty-server-10.0.20.jar:10.0.20] at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1580) ~[jetty-server-10.0.20.jar:10.0.20] at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:221) ~[jetty-server-10.0.20.jar:10.0.20] at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1384) ~[jetty-server-10.0.20.jar:10.0.20] at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:176) ~[jetty-server-10.0.20.jar:10.0.20] at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:484) ~[jetty-servlet-10.0.20.jar:10.0.20] at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1553) ~[jetty-server-10.0.20.jar:10.0.20] at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:174) ~[jetty-server-10.0.20.jar:10.0.20] at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1306) ~[jetty-server-10.0.20.jar:10.0.20] at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:129) ~[jetty-server-10.0.20.jar:10.0.20] at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:149) ~[jetty-server-10.0.20.jar:10.0.20] at org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:228) ~[jetty-server-10.0.20.jar:10.0.20] at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:141) ~[jetty-server-10.0.20.jar:10.0.20] at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122) ~[jetty-server-10.0.20.jar:10.0.20] at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:301) ~[jetty-rewrite-10.0.20.jar:10.0.20] at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122) ~[jetty-server-10.0.20.jar:10.0.20] at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:822) ~[jetty-server-10.0.20.jar:10.0.20] at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122) ~[jetty-server-10.0.20.jar:10.0.20] at org.eclipse.jetty.server.Server.handle(Server.java:563) ~[jetty-server-10.0.20.jar:10.0.20] at org.eclipse.jetty.server.HttpChannel$RequestDispatchable.dispatch(HttpChannel.java:1598) ~[jetty-server-10.0.20.jar:10.0.20] at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:753) ~[jetty-server-10.0.20.jar:10.0.20] at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:501) ~[jetty-server-10.0.20.jar:10.0.20] at org.eclipse.jetty.server.HttpChannel.run(HttpChannel.java:461) ~[jetty-server-10.0.20.jar:10.0.20] at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.runTask(AdaptiveExecutionStrategy.java:421) ~[jetty-util-10.0.20.jar:10.0.20] at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.consumeTask(AdaptiveExecutionStrategy.java:390) ~[jetty-util-10.0.20.jar:10.0.20] at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:277) ~[jetty-util-10.0.20.jar:10.0.20] at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.produce(AdaptiveExecutionStrategy.java:193) ~[jetty-util-10.0.20.jar:10.0.20] at org.eclipse.jetty.http2.HTTP2Connection.produce(HTTP2Connection.java:208) ~[http2-common-10.0.20.jar:10.0.20] at org.eclipse.jetty.http2.HTTP2Connection.onFillable(HTTP2Connection.java:155) ~[http2-common-10.0.20.jar:10.0.20] at org.eclipse.jetty.http2.HTTP2Connection$FillableCallback.succeeded(HTTP2Connection.java:450) ~[http2-common-10.0.20.jar:10.0.20] at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:100) ~[jetty-io-10.0.20.jar:10.0.20] at org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53) ~[jetty-io-10.0.20.jar:10.0.20] at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.runTask(AdaptiveExecutionStrategy.java:421) ~[jetty-util-10.0.20.jar:10.0.20] at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.consumeTask(AdaptiveExecutionStrategy.java:390) ~[jetty-util-10.0.20.jar:10.0.20] at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:277) ~[jetty-util-10.0.20.jar:10.0.20] at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.run(AdaptiveExecutionStrategy.java:199) ~[jetty-util-10.0.20.jar:10.0.20] at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:411) ~[jetty-util-10.0.20.jar:10.0.20] at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:969) ~[jetty-util-10.0.20.jar:10.0.20] at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1194) ~[jetty-util-10.0.20.jar:10.0.20] at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1149) ~[jetty-util-10.0.20.jar:10.0.20] at java.base/java.lang.Thread.run(Unknown Source) [?:?] 2025-01-30 16:58:24.643 INFO (qtp1155769010-5575-search-solrcloud-4.csr-58880) [c:l5RecommendationCollection s:shard2 r:core_node502 x:l5RecommendationCollection_shard2_replica_n501 t:search-solrcloud-4.csr-58880] o.a.s.c.S.Request webapp=/solr path=/get params={distrib=false&qt=/get&fingerprint=false&getVersions=100&wt=javabin&version=2} status=0 QTime=0 2025-01-30 16:58:24.644 INFO (qtp1155769010-6634-search-solrcloud-4.csr-58881) [c:l5RecommendationCollection s:shard2 r:core_node558 x:l5RecommendationCollection_shard2_replica_n557 t:search-solrcloud-4.csr-58881] o.a.s.c.S.Request webapp=/solr path=/get params={distrib=false&qt=/get&fingerprint=false&getVersions=100&wt=javabin&version=2} status=0 QTime=0 2025-01-30 16:41:20.744 INFO (zkCallback-13-thread-61) [c:l5RecommendationCollection s:shard2 r:core_node490 x:l5RecommendationCollection_shard2_replica_n489 t:] o.a.s.u.PeerSync PeerSync: core=l5RecommendationCollection_shard2_replica_n489 url=http://search-solrcloud-9.csr:80/solr Received 29 versions from http://search-solrcloud-5.csr:80/solr/l5RecommendationCollection_shard2_replica_n97/ fingerprint:null ERROR (recoveryExecutor-10-thread-212-processing-l5RecommendationCollection_shard3_replica_n589 search-solrcloud-0.csr-62278 move-replicas-search-solrcloud-941610687021459 core_node590 create search-solrcloud-4.csr:80_solr l5RecommendationCollection shard3) [c:l5RecommendationCollection s:shard3 r:core_node590 x:l5RecommendationCollection_shard3_replica_n589 t:search-solrcloud-0.csr-62278] o.a.s.h.ReplicationHandler Index fetch failed => org.apache.solr.common.SolrException: Unable to download _7s2.fdt completely. Downloaded 193986560!=400378449 ERROR (recoveryExecutor-10-thread-212-processing-l5RecommendationCollection_shard3_replica_n589 search-solrcloud-0.csr-62278 move-replicas-search-solrcloud-941610687021459 core_node590 create search-solrcloud-4.csr:80_solr l5RecommendationCollection shard3) [c:l5RecommendationCollection s:shard3 r:core_node590 x:l5RecommendationCollection_shard3_replica_n589 t:search-solrcloud-0.csr-62278] o.a.s.c.RecoveryStrategy Error while trying to recover => org.apache.solr.common.SolrException: Replication for recovery failed.

Operator log:
2025-01-30T17:19:35Z INFO Found async status {"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"search","namespace":"csr"}, "namespace": "csr", "name": "search", "reconcileID": "0b0fae61-ad23-44c0-8286-c4fe88f3aecb", "evictionReason": "scaleDown", "requestId": "move-replicas-search-solrcloud-9", "state": "running"}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant