-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow cache under moderate simultaneous load #80
Comments
Hey @sjmueller , first of all, thanks for the very detailed explanation and information, it will be very helpful to reproduce the scenario and see what could be happening. I'll dig into it! On the other hand, yes, the Redis adapter is not compatible with v2 yet, but it is my top priority now, aiming to push the fixes so NebulexRedisAdapter can be compatible with v2 as soon as possible, most likely it will be ready by the end of the next week (maybe before 🤞 ). I'd suggest two quick tests, 1) change the backend to BTW, out of curiosity, did you have this issue with the previous version 1.2 or 1.1? to is it something new with v2? |
Hi @cabol thanks for the quick response. We switched directly from mnesia to nebulex |
Ok we are using an ElastiCache Redis instance in AWS now with Nebulex |
Thanks for the feedback! I was checking out the partitioned adapter implementation in v2 and v1.2 and there are no big differences implementation-wise, both use the same |
Hey! I did several benchmark tests with the partitioned cache (it is mostly a first attempt to identify any kind of issues with the partitioned adapter). Using benchee I ran the next test scenarios: Benchee:
Scenario 1
Scenario 2
Scenario 3
Insights
About your use-caseI was thinking in your use-case, you have: result =
Enum.reduce(Domain.Repo.all(members_count_query), %{}, fn item, map ->
Map.put_new(map, "ConversationMembersCount:#{item.conversation_id}", item)
end)
NebulexCache.put_all(result, on_conflict: :override)
NebulexCache.get("ConversationMembers:#{conversation_id}") Questions
|
@sjmueller any feedback on this? As I explained in my previous comment, I did several bench tests but I couldn't reproduce the issue, maybe if you can give me more details about your scenario (check my questions in the prev comment)? |
Hi @cabol, circling back here. It turns out there were some areas where we were caching full serialized objects, and doing so in sequential fashion. Example we might loop through and write 100 user objects to the cache for each api request, and this added up over simultaneous load. For some reason this performed much better with the redis adapter. Furthermore we’ve optimized these scenarios by using redis pipelines (via the nebulex adapter) so things are much more efficient now. Hope this helps. |
Absolutely, it helps a lot, thanks for the feedback, I'm glad to hear you were able to sort it out by using the Redis adapter. That is precisely the idea of Nebulex, be able to choose the adapter and topology that fits better with your needs, like in this case. In fact, I remember I ran some benchmark tests with the Redis adapter using a Redis Cluster with 5 nodes and the partitioned adapter with the same nodes connected by Distributed Erlang/Elixir, and I got better results with the Redis one. But anyway, thanks again, this is very helpful because it gives me a better idea of the scenario, I will check and see if maybe we can improve the performance. |
Honestly I love what you've built here with nebulex, because it models exactly the way I think about caching, i.e. the ability to annotate functions so that caching does it's job but not at the expense of the original contract. All this with flexibility and no lock-in! We're currently using nebulex in a more manual, centralized fashion but can't wait to set aside time and refactor to the idiomatic approach. All the work you've done is greatly appreciated 🙏 keep it up! |
Great to hear that 😄 ! And of course, there is a long TODO list yet! |
Closing this issue for now. Once I have more information about it, if it can be improved somehow, I will create a separate issue for the enhancement. |
We have two api nodes in our cluster and have Nebulex v2.0.0-rc.0 setup with
Nebulex.Adapters.Partitioned
. Under regular circumstances, accessing the cache is decently fast, under 50ms. But under semi-heavy load, we had put/delete transactions that are taking 2+ seconds. Originally we thought adding keys to the transactions would help, but the performance continued to be subpar. So we removed the transactions entirely and still have the same problem!Some details about our setup:
Nebulex.Adapers.Partitioned
with this configuration:Under small load of <100 simultaneous users, almost all cache actions execute in <1ms with some outliers up to 20ms which is performance that we would expect
When we have sudden semi-heavy load (e.g. after a mass push notification where 500 people open the app at the same time) the cache gets incredibly slow, resulting in data not returning back to everyone’s app for up to 1 minute (!!!)
You can see how all cache calls start to balloon here to beyond 1s and we've even seen longer, pushing 3-5s and higher even without using transactions:
We have checked CPU utilization on the api nodes, even under the heaviest load the peak is less that 38%
As you can imagine, this is really hampering our ability to scale with our app growth! We have tried to move to a simpler, single node Redis setup that avoids partioning/replication using the official adapter, but
v2.0.0-rc.0
[compatibility has stopped us].(cabol/nebulex_redis_adapter#21) Any help would be appreciated!The text was updated successfully, but these errors were encountered: