Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: event factory support #12928

Closed
wants to merge 12 commits into from

Conversation

kares
Copy link
Contributor

@kares kares commented May 25, 2021

What does this PR do?

This PR is motivated by having a way of re-using LogStash::Event.from_json with target => [namespace] (e.g. in the JSON codec). We benchmarked the generic LogStash::Json parsing (which has JRuby parsing hooks) and found it noticeably slower compared to the Java native Event.fromJson.

A clean way forward would be to introduce a notion of an EventFactory with a newEvent (new_event) interface.
The factory is meant to be later re-used/extended with ECS compatibility (new_event behaving slightly differently in legacy vs ecs mode), this is out-of-scope here, still something to keep in mind.

Checklist

  • My code follows the style guidelines of this project
  • [ ] I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files (and/or docker env variables)
  • I have added tests that prove my fix is effective or that my feature works

How to test this PR locally

Related

follow-up adapter mixin: logstash-plugins/logstash-mixin-event_support#2

@kares kares changed the title Fix: avoid cast exception while mapping from json Feat: event factory support May 26, 2021
@@ -234,21 +237,30 @@ public IRubyObject ruby_to_json(ThreadContext context, IRubyObject[] args)
try {
return RubyString.newString(context.runtime, event.toJson());
} catch (Exception e) {
throw RaiseException.from(context.runtime, RubyUtil.GENERATOR_ERROR, e.getMessage());
throw toRubyError(context, RubyUtil.GENERATOR_ERROR, e);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added these as I wanted to see (have) the Java causes

}
}

// @param value [String] the json string. A json object/map will newFromRubyArray to an array containing a single Event.
// and a json array will newFromRubyArray each element into individual Event
// @return Array<Event> array of events
@JRubyMethod(name = "from_json", required = 1, meta = true)
public static IRubyObject ruby_from_json(ThreadContext context, IRubyObject recv, RubyString value)
{
public static IRubyObject ruby_from_json(ThreadContext context, IRubyObject recv, RubyString value, final Block block) {
Copy link
Contributor Author

@kares kares Jun 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slightly different that the Java fromJson(final String json, final EventFactory factory) signature.
the reason for that is the Ruby vs Java event duality, this slightly resembles the Java Event API (the EventFactory being a functional interface could be implemented as a block) while doing the Java wrapping/unwrapping seamlessly (as we start with EventFactory like implementations in Ruby).

HINT: was wondering for a while now why LogStash::Event isn't just a org.logstash.Event to reduce the difference between the Java and Ruby API (something to consider for the 8.0 future) - if this was considered while designing the Java event API (conceptually - the naming differences can be handled on the Ruby end by extending the Java class)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a lot of back-and-forth on this, but IIRC we explicitly avoided having any jruby-related infection of the Java Event API so that consumers of the API would have a much smaller surface area. The Ruby APIs are also really hard to break confidently because we don't have good data on their usage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usage of the Block and FunctionalInterface for the EventFactory is clever, and in a really nice way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still consider the block-form to be a "hack" around needing to manually wrap returns from a Java EventFactory.
was trying to think about a potential future here: LogStash::Event = org.logstash.Event (whether it happens or not) in which case the block from would become (so we should consider it "internal").

The Ruby APIs are also really hard to break confidently because we don't have good data on their usage.

I didn't mean breaking the Ruby API, Java API would have stayed as is and the extra Ruby behavior would be scripted on top org.logstash.Event.class_eval { def get(arg); getField(arg); end } etc.

From a user's perspective the duality is confusing.

end

# @return an event factory
def build
Copy link
Contributor Author

@kares kares Jun 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

being a builder a named this build - also wanted the end result to set the new @event_factory for the plugin.
maybe it's nicer being explicit: self.event_factory = event_factory_builder.with_target(target).build
or having a event_factory_builder.with_target(target).set_event_factory! instead of build

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels subtly infectious and likely to be a source of surprises. I'm not sure we have found the right boundary.

A plugin developer should either be able to know that they are working with a targeted event factory at its call-site, or they should not need to care about it at all. As implemented, we end up in a weird hybrid where the behaviour of event_factory is dependent on code paths that aren't immediately discoverable.

I think there are two directions we could go with this.

  1. Absorb with_target into EventFactory, making its use explicit all the way through

    Can we reasonably make with_target a part of the EventFactory API?

    Something like the following would give a lot of flexibility in use while not being secretly magical.

    class EventFactory ## base "abstract" class to be subclassed?
      def with_target(target_field_reference)
        return self if target_field_reference.nil?
      
        TargetedEventFactory.new(self, target_field_reference)
      end
    end
        def register
          @targeted_event_factory = event_factory.with_target(@target)
        end
        
        def decode(data)
          events = LogStash::Event.fromJson(data, @targeted_event_factory)
        end
  2. Make EventFactorySupport#event_factory always automatically respect a provided @target. This feels even more magical and not quite fully thought through, but perhaps closer to a useful and maintainable boundary.

    module EventFactorySupport
      def event_factory
        @event_factory ||= begin # TODO: threadsafety?
          ef = default_event_factory
          ef = TargetedEventFactory.new(ef, @target) unless @target.nil?
          ef
        end
      end
    end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the way I was trying to approach this is as if we decided to later rely only on a Java EventFactory API,

  • having EventFactory#withTarget on the interface feels unfortunate, since the interface would be tight to a "targeted" factory implementation
  • a EventFactoryBuilder of some kind might be necessary (later) if we decide on the explicit approach - instead of the @event_factory magically resolving itself based on target and ecs_compatibility fields. the explicitness should give us more control for whatever we decide to do later (at a cost of having slightly more code in the plugins)

My current candidate(s) would be to break the builder from magically setting the event_factory :

  def register

    self.event_factory = event_factory_builder.with_target(target).event_factory
    
    # or being very explicit:
    
    self.event_factory = EventFactoryBuilder.new(default_event_factory).with_target(target).event_factory
    
  end

@kares kares requested a review from yaauie June 1, 2021 17:41
Copy link
Member

@yaauie yaauie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early review:

Certainly on track. Modifying the Event#from_json and friends to optionally take an event factory is a really nice bridge to make the transition simple.

I have concerns with the ruby implementation's EventFactory builder, and how its use infects the plugin. I think we will need to workshop a couple of iterations on this, as my suggestions also aren't fully thought through. Since this is an API we will likely need to support for years to come, getting the boundary right will give us the freedom to swap out internals later if we find a need to do so.

I've left some comments in-line and am glad to pair with you on this if you want to drop an hour on my calendar.

end

# @return an event factory
def build
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels subtly infectious and likely to be a source of surprises. I'm not sure we have found the right boundary.

A plugin developer should either be able to know that they are working with a targeted event factory at its call-site, or they should not need to care about it at all. As implemented, we end up in a weird hybrid where the behaviour of event_factory is dependent on code paths that aren't immediately discoverable.

I think there are two directions we could go with this.

  1. Absorb with_target into EventFactory, making its use explicit all the way through

    Can we reasonably make with_target a part of the EventFactory API?

    Something like the following would give a lot of flexibility in use while not being secretly magical.

    class EventFactory ## base "abstract" class to be subclassed?
      def with_target(target_field_reference)
        return self if target_field_reference.nil?
      
        TargetedEventFactory.new(self, target_field_reference)
      end
    end
        def register
          @targeted_event_factory = event_factory.with_target(@target)
        end
        
        def decode(data)
          events = LogStash::Event.fromJson(data, @targeted_event_factory)
        end
  2. Make EventFactorySupport#event_factory always automatically respect a provided @target. This feels even more magical and not quite fully thought through, but perhaps closer to a useful and maintainable boundary.

    module EventFactorySupport
      def event_factory
        @event_factory ||= begin # TODO: threadsafety?
          ef = default_event_factory
          ef = TargetedEventFactory.new(ef, @target) unless @target.nil?
          ef
        end
      end
    end

* @throws IOException when (JSON) parsing fails
*/
public static Event[] fromJson(final String json) throws IOException {
return fromJson(json, EventFactory.DEFAULT);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 API continuity

}
}

// @param value [String] the json string. A json object/map will newFromRubyArray to an array containing a single Event.
// and a json array will newFromRubyArray each element into individual Event
// @return Array<Event> array of events
@JRubyMethod(name = "from_json", required = 1, meta = true)
public static IRubyObject ruby_from_json(ThreadContext context, IRubyObject recv, RubyString value)
{
public static IRubyObject ruby_from_json(ThreadContext context, IRubyObject recv, RubyString value, final Block block) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a lot of back-and-forth on this, but IIRC we explicitly avoided having any jruby-related infection of the Java Event API so that consumers of the API would have a much smaller surface area. The Ruby APIs are also really hard to break confidently because we don't have good data on their usage.

expect{LogStash::Event.from_json(nil)}.to raise_error
expect{LogStash::Event.new(LogStash::Json.load(nil))}.to raise_error
end
expect{LogStash::Event.from_json(nil)}.to raise_error # TypeError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of just commenting the error type, let's assert it. If we don't like what is being raised in practice, we should take this as an opportunity to refine the implementation. BWC-wise, we only promise to raise here so raising something more specific and specifying that here is not a breaking change.

}
}

// @param value [String] the json string. A json object/map will newFromRubyArray to an array containing a single Event.
// and a json array will newFromRubyArray each element into individual Event
// @return Array<Event> array of events
@JRubyMethod(name = "from_json", required = 1, meta = true)
public static IRubyObject ruby_from_json(ThreadContext context, IRubyObject recv, RubyString value)
{
public static IRubyObject ruby_from_json(ThreadContext context, IRubyObject recv, RubyString value, final Block block) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usage of the Block and FunctionalInterface for the EventFactory is clever, and in a really nice way.

@yaauie
Copy link
Member

yaauie commented Jun 8, 2021

We had a call today, and I'd like to summarize what we discussed and the decisions that came out of it.

We decided to remove the idea of an EventFactory builder -- plugins don't need this much control, and giving it here locks us in and prevents us from changing out the EventFactory implementation in logstash core based on logstash's configuration.

We decided that the scope of this PR is to introduce the EventFactory interface with a single method EventFactory#new_event and two implementations (perhaps BasicEventFactory and TargetedEventFactory), and to effectively add three methods to the API of LogStash::Plugin:

  • Plugin#event_factory: returns a memoized EventFactory instance, that can be used to create events
  • Plugin#targeted_event_factory: returns a memoized EventFactory that respects the plugin's target configuration.
    • if no target is provided, this effectively is equivalent to Plugin#event_factory.
    • if a target is provided, returns a TargetedEventFactory that wraps the return value of Plugin#event_factory.
  • Plugin#create_event_factory: returns a new EventFactory based on the plugin's configuration and state. We may want to mark this as an "internal" or "advanced" API.

Upon writing this out, I think that:

  • We should document these methods as effectively-final, warning plugin developers from overriding them. There are a lot of possible sharp edges here, and any overriding will create a tight coupling that eventually prevents us from moving forward.
  • The Plugin#create_event_factory can probably be a private part of the API. I see no reason why a plugin would need to provide its own implementation, and doing so would prevent us from changing the internals in logstash core to support configuration-driven alternate event factories (such as factories that produce copy-on-write events, alternate field-reference-syntax events, etc.).
  • we should memoize to underscore-prefixed ivars in a concurrency-safe manner and document the ivar as an implementation detail that should not be relied on (such as @_event_factory and @_targeted_event_factory).

@kares kares force-pushed the from-json-improve-cast-error_7.x branch from d6afb30 to 81d6b71 Compare June 21, 2021 18:09
@kares kares force-pushed the from-json-improve-cast-error_7.x branch from fca920e to 84f781c Compare June 22, 2021 05:46
@kares
Copy link
Contributor Author

kares commented Jun 22, 2021

superseded by #13017

@kares kares closed this Jun 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants