Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sending event id to Sidekiq instead of serialized event #755

Closed
matiasgarcia opened this issue Aug 21, 2020 · 9 comments
Closed

Sending event id to Sidekiq instead of serialized event #755

matiasgarcia opened this issue Aug 21, 2020 · 9 comments

Comments

@matiasgarcia
Copy link

In the docs, there is some suggested code to manage async handlers on Sidekiq.

The Scheduler is as follows:

module RailsEventStore
  class SidekiqScheduler
    def call(klass, serialized_event)
      klass.perform_async(serialized_event.to_h)
    end

    def verify(subscriber)
      subscriber.is_a?(Class) && subscriber < Sidekiq::Worker
    end
  end
end

And the AsyncHandler from the library is:

module RailsEventStore
  module AsyncHandler
    def perform(payload)
      super(Rails.configuration.event_store.deserialize(payload.symbolize_keys))
    end
  end
end

This works perfectly fine but we found ourselves hitting max memory error on Redis under heavy load. This is when I realized that the Scheduler serializes the whole event.

I was wondering if it's possible to provide just the event_id and write a custom AsyncHandler to retrieve it.

Like this:

module RailsEventStore
  class SidekiqScheduler
    def call(klass, serialized_event)
      klass.perform_async(serialized_event.event_id)
    end

    def verify(subscriber)
      subscriber.is_a?(Class) && subscriber < Sidekiq::Worker
    end
  end
end
module RailsEventStore
  module AsyncHandler
    def perform(payload)
      event = RailsEventStoreActiveRecord::Event.find(payload)
      super(event)
    end
  end
end

Or is there any problem on doing so?

@mostlyobvious
Copy link
Member

I was wondering if it's possible to provide just the event_id and write a custom AsyncHandler to retrieve it.

That sounds reasonable and totally valid approach as long as it is scheduled "after commit" (so the event is already committed and visible outside any possible transaction wrapping publish process).

Nitpicks:

  • RailsEventStoreActiveRecord::Event is a private detail of an adapter, you shouldn't use it directly — there's an API on Client for retrieving event by its id https://railseventstore.org/docs/read/#reading-specific-events
  • I'd prefer having your app specific namespace to redefining modules and classes from RailsEventStore namespace for scheduler and handler behaving differently than an upstream one, you can swap these components in configuration:
event_store = RailsEventStore::Client.new(
  dispatcher: RubyEventStore::ComposedDispatcher.new(
                     RubyEventStore::ImmediateAsyncDispatcher.new(scheduler: MyScheduler.new),
                     RubyEventStore::Dispatcher.new)
)

@matiasgarcia
Copy link
Author

Thanks for the guidance @pawelpacana!

Our rails event store is configured as follows:

  event_store = RailsEventStore::Client.new(
    dispatcher: RubyEventStore::ComposedDispatcher.new(
      RailsEventStore::AfterCommitAsyncDispatcher.new(scheduler: RailsEventStore::SidekiqScheduler.new),
      RubyEventStore::Dispatcher.new
    )
  )

Could it be that with RailsEventStore::AfterCommitAsyncDispatcher I don't really have to publish events in an after_commit hook but instead after_save/update/destroy should be enough?

@mpraglowski
Copy link
Member

There are two additional callbacks that are triggered by the completion of a database transaction: after_commit and after_rollback. These callbacks are very similar to the after_save callback except that they don't execute until after database changes have either been committed or rolled back. They are most useful when your active record models need to interact with external systems which are not part of the database transaction.

from: https://guides.rubyonrails.org/active_record_callbacks.html#transaction-callbacks

The "external systems which are not part of the database transaction" the documentation mentions here is the Redis used by Sidekiq to store scheduled jobs.

Could it be that with RailsEventStore::AfterCommitAsyncDispatcher I don't really have to publish events in an after_commit hook but instead after_save/update/destroy should be enough?

Then you are at risk of running handlers (Sidekiq jobs) for events that have not been stored because transaction has been rolled back.

@mostlyobvious
Copy link
Member

@matiasgarcia with RailsEventStore::AfterCommitAsyncDispatcher you're fine — the dispatcher would not schedule anything before commit or after rollback

ApplicationRecord.transaction do
   # do something
   event_store.publish(...) # not scheduling yet, as others might not see uncommitted changes from this transaction or it might be rolledback
end
# scheduling now

@matiasgarcia
Copy link
Author

Thanks @pawelpacana.

I was wondering if maybe a maintained version of this AsyncHandler should be in the gem? I am saving 92% of my Redis storage with this change.

@mostlyobvious
Copy link
Member

It's an interesting tradeoff to save on redis memory in exchange for additional query on a database. I'm not sure yet what is better as a default choice in AsyncHandler.

I'd accept PR with and alternative or configurable strategy for AsyncHandler for now 👍

@paneq
Copy link
Member

paneq commented Sep 5, 2020

@pawelpacana I like the idea!

I think we could change the implementation but keep the dispatcher/scheduler contract of (event, serialized_event) the same so that if someone wants to use the serialized version they can still do it. It's beneficial in case the consumer does not have access to RES storage.

@matiasgarcia
Copy link
Author

matiasgarcia commented Nov 2, 2020

It's an interesting tradeoff to save on Redis memory in exchange for an additional query on a database. I'm not sure yet what is better as a default choice in AsyncHandler.

I'd accept PR with and alternative or configurable strategy for AsyncHandler for now

In our case, we were firing around 200k events in a short time span (10 minutes) with a Redis DB of 100MB storage. This made it quickly reach the maximum threshold, because in our events we were tracking changes to AR models, making the payload quite big. So in this specific use case, it paid off to just send the event id. The additional query to fetch the event should be fairly quick since they are ids, although I don't have a benchmark to guarantee this :)

@mostlyobvious
Copy link
Member

For the record and anyone looking for clear transition path:

  • following handler mixin works well both with full payload and just an event_id
module EventFromEventId
  def perform(payload)
    event_id = payload.symbolize_keys.fetch(:event_id)
    super(event_store.read.event(event_id))
  end
end
  • passing slice of the event payload containing event_id, for one-way compatibility
class Scheduler
  def call(klass, serialized_event)
    klass.perform_later({ event_id: serialized_event.event_id })
  end
  def verify(subscriber)
    Class === subscriber && !!(subscriber < ActiveJob::Base)
  end
end

mostlyobvious added a commit that referenced this issue Nov 3, 2022
Pro:
- refactoring-friendly (i.e. migrating event_type) because event data is
  not stored temporarily in redis, one moving part to care about less
- constant size in redis as opposed to size dependent on data and
  metadata
- simpler configuration (no need for marching scheduler and async
  handler serializers)

Con:
- handlers require database access (they usually do have it)
- additional SQL query to load an event

Serialization cost remains unchanged. Whether passing an id or
full-serialized payload, the data and metadata need to be deserialized.

Also there's no additional serialization step for scheduler, it reuses
serialization for repository if that is needed.

#755
lukaszreszke pushed a commit that referenced this issue Feb 8, 2023
Pro:
- refactoring-friendly (i.e. migrating event_type) because event data is
  not stored temporarily in redis, one moving part to care about less
- constant size in redis as opposed to size dependent on data and
  metadata
- simpler configuration (no need for marching scheduler and async
  handler serializers)

Con:
- handlers require database access (they usually do have it)
- additional SQL query to load an event

Serialization cost remains unchanged. Whether passing an id or
full-serialized payload, the data and metadata need to be deserialized.

Also there's no additional serialization step for scheduler, it reuses
serialization for repository if that is needed.

#755
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants