Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API Generator and Removing support of Ruby 2 #261

Merged
merged 1 commit into from
Jan 8, 2025

Conversation

nhtruong
Copy link
Collaborator

@nhtruong nhtruong commented Dec 11, 2024

This PR contains an API generator for the OpenSearch Ruby client. This generator does not simply generate the API methods conforming to the how methods and namespaces are currently structured in this repo. Instead, the generated methods are in a new structure that is simpler, and move intuitive to use. There's also some refactoring of the existing API methods to streamline the code and fix some issues.

Module structure

Currently, each API method is added to the OpenSearch::Client class via a chain of includes. For example the health method of the cat namespace is added to the OpenSearch::Client class via the following chain of includes:

  • OpenSearch::Client includes OpenSearch::API
  • OpenSearch::API when included, triggers the inclusion of OpenSearch::API::Cat and other namespaces to the base class (i.e. OpenSearch::Client)
  • OpenSearch::API::Cat contains the cat method, which returns an instance of the OpenSearch::API::Cat::CatClient class.
  • OpenSearch::API::Cat::CatClient includes the OpenSearch::API::Cat::Actions module, which contains the health method.

The new structure is a a lot simpler: Within the definition of the OpenSearch::Client class itself, there's a method named cat that returns an instance of the OpenSearch::Namespace that includes the OpenSearch::API::Cat::Actions module where the health method is defined. Note that we also do not need to define a class for each namespace anymore.

How the perform_request method is invoked

Currently when you call client.cat.health:

  • It will first trigger the perform_request method of the ::CatClient, which it gets from the ::Common module.
  • That method, in turn, calls the perform_request method of the OpenSearch::Client class.
  • The OpenSearch::Client class actually does not define a perform_request method. So, method_missing is called
  • method_missing verifies the OpenSearch version before passing the baton to the perform_request method of OpenSearch::Transport::Client.

The new path is a lot simpler: when client.cat.health is called, it directly calls the perform_request method of OpenSearch::Transport::Client. Note that this also skips over the version check, which is not needed anymore.

Storing valid query parameters

Currently, each namespace defines a ParamsRegistry module that registers an array of valid query parameters symbols per API action. These are iterated to validate the query parameters passed to the API method.

With the new approach, the query parameter names are stored as a set of strings (instead of an array of symbols) in a CONSTANT named after the API action. This approach has the following benefits:

  • We no longer need to define a separate ParamsRegistry module for each namespace.
  • It's a lot faster to check if a query parameter is valid, since we can use the include? method of the Set class.
  • The use of strings over symbols allows for more flexibility in the query parameter names, especially those . in the notifications namespace.

Normalizing query parameters

Instead of simply cloning the arguments passed to the API method, the new approach creates a new hash where all keys are converted to strings. This is done to ensure that the query parameters are always in a consistent format, and add support for query parameters with special characters.

Validating query parameters

Currently, this validation is done through __validate_and_extract_params method, which iterates over the array of valid query params TWICE per param. The new validate_query_params method only needs to check if the query parameter is included in the set of valid query parameters.


This PR also removes the support of Ruby 2.x, which is a breaking change. Client version has also been bumped to 4.0.0

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@dblock
Copy link
Member

dblock commented Dec 11, 2024

Dealing with some Rubocop issues right now.

Highly encourage you to do this instead of trying to address Rubocop complaining:

rubocop -a
rubocop --auto-gen-config

@dblock
Copy link
Member

dblock commented Dec 12, 2024

Consider splitting this into the generator and generated code PRs?

@nhtruong
Copy link
Collaborator Author

@dblock There is a lot of refactoring that requires the existing API methods to also be updated (via running the generator) that it's impossible to do so without. I'll write up what's being change once I get all the tests to pass.

@nhtruong nhtruong force-pushed the api_generator branch 3 times, most recently from 5c49ab0 to fc3f775 Compare December 17, 2024 20:33
@nhtruong nhtruong changed the title API Generator API Generator and Removing support of Ruby 2 Dec 17, 2024
@nhtruong nhtruong marked this pull request as ready for review December 17, 2024 20:36
@nhtruong
Copy link
Collaborator Author

@dblock This is ready for review. There's a lot going on but basically:

  • The API Generator.
  • A lot more integration specs to test the generated API methods
  • Restructured OpenSearch::Client (Check the PR description for more details)
  • Refactored API methods. For example the delete method:

OLD:

def delete(arguments = {})
  raise ArgumentError, "Required argument 'index' missing" unless arguments[:index]
  raise ArgumentError, "Required argument 'id' missing" unless arguments[:id]

  headers = arguments.delete(:headers) || {}

  arguments = arguments.clone

  _id = arguments.delete(:id)

  _index = arguments.delete(:index)

  method = OpenSearch::API::HTTP_DELETE
  path   = "#{Utils.__listify(_index)}/_doc/#{Utils.__listify(_id)}"
  params = Utils.__validate_and_extract_params arguments, ParamsRegistry.get(__method__)

  body = nil
  if Array(arguments[:ignore]).include?(404)
    Utils.__rescue_from_not_found { perform_request(method, path, params, body, headers).body }
  else
    perform_request(method, path, params, body, headers).body
  end
end

ParamsRegistry.register(:delete, %i[
  wait_for_active_shards
  refresh
  ...
].freeze)

NEW:

def delete(args = {})
  args = Utils.normalize_arguments(args)
  raise ArgumentError, "Required argument 'id' missing" unless args['id']
  raise ArgumentError, "Required argument 'index' missing" unless args['index']

  _id = args.delete('id')
  _index = args.delete('index')

  ignore  = args.delete('ignore') || []
  headers = args.delete('headers') || {}
  body    = args.delete('body')
  method  = 'DELETE'
  url     = [_index, '_doc', _id].filter(&:present?).join('/').squeeze('/')

  Utils.validate_query_params args, DELETE_QUERY_PARAMS
  transport.perform_delete_request method, url, args, body, headers, ignore.include?(404)
end

DELETE_QUERY_PARAMS = Set.new(%w[
  if_primary_term
  if_seq_no
  ...
]).freeze

@Earlopain
Copy link
Collaborator

This looks pretty cool. I haven't checked it out too closely but have two comments for now:

  • The activesupport dependency seems heavy-handed. I see you are using present? but I would rather see that inlined. The previous implementation simply relied on nil and that seems fine to me. For your new version, I'd write it like this: url = [_index, '_msearch'].compact.join('/').squeeze('/). There are no unit tests (those all pass even if I remove it), so I'm not sure if there are other extensions I missed.
  • All arguments must now be passed with string keys. I see your reason but feel like this is a bit much. Apart from requiring a bunch of churn for already written code, I'd say that symbols are pretty established as the calling convention for ruby. Right now you can do indicies(id: "foobar") and it looks just like a normal method call, indicies("id" => "foobar") doesn't have the same feel. Actually, there is no limit on what a symbol can be, :"foo.bar" is a valid entry. But, that is probably not-so-commonly known syntax. Instead, how about args.delete(:id) || "args.delete("id")?

@nhtruong
Copy link
Collaborator Author

nhtruong commented Dec 17, 2024

The activesupport dependency seems heavy-handed. I see you are using present? but I would rather see that inlined. The previous implementation simply relied on nil and that seems fine to me. For your new version, I'd write it like this: url = [_index, '_msearch'].compact.join('/').squeeze('/). There are no unit tests (those all pass even if I remove it), so I'm not sure if there are other extensions I missed.

The main difference I want from present? vs not .nil? is accounting for empty strings. Though I do realize that it's rare for the user to accidentally pass an empty string as value.

All arguments must now be passed with string keys. I see your reason but feel like this is a bit much. Apart from requiring a bunch of churn for already written code, I'd say that symbols are pretty established as the calling convention for ruby. Right now you can do indicies(id: "foobar") and it looks just like a normal method call, indicies("id" => "foobar") doesn't have the same feel. Actually, there is no limit on what a symbol can be, :"foo.bar" is a valid entry. But, that is probably not-so-commonly known syntax. Instead, how about args.delete(:id) || "args.delete("id")?

That's not what Utils.normalize_argument does. It translates symbol keys into string keys. All code using symbols will still work (The only breaking change is removing support of Ruby 2. All of the refactoring done here does not affect the functionality of the API methods). The main advantage is that string keys will now also be accepted (It used to throw errors). Since we already have to Hash#clone the args, constructing a new hash with normalized keys does not create any extra overhead for the method. We can also invert that and makes all keys symbols. Though the response body will always have strings as keys. So, that's a bit of a mismatch.

@Earlopain
Copy link
Collaborator

The main difference I want from present? vs not .nil? is accounting for empty strings

Right, in that case you can write it like this:

[nil, "", "123", 456].reject { _1.nil? || (_1.is_a?(String) && _1.empty?) }

It's a bit more code but not much so and I would say this is worth it to drop for that, especially since it's all generated anyways.

As a rails-agnostic library I would never force activesupport on users (and if you absolutely must, cherry-pick what you need instead of loading every core extension with active_support/all.

That's not what Utils.normalize_argument does.

Apologies, I'm not sure how I missed that method. Carry on.

We can also invert that and makes all keys symbols

I think that will be marginally faster since it can avoid an allocation in case the key is already a symbol with to_sym instead of to_s but other than that I have no preference.

@dblock
Copy link
Member

dblock commented Dec 18, 2024

As a rails-agnostic library I would never force activesupport on users (and if you absolutely must, cherry-pick what you need instead of loading every core extension with active_support/all.

@nhtruong I know I said that my personal opinion is not to worry about this, but if the only thing we're using is present? then probably @Earlopain is right.

Copy link
Member

@dblock dblock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏

I am good with this with whatever revisions you want to make, I'd say my only must have is the need to have to duplicate args before .delete.

One of a future improvements would be to use keyword arguments instead of args = {}.

I see a bunch of deleted specs. Do you feel like we don't need to generate tests that call these generated methods?

@@ -1,6 +1,10 @@
# Upgrading
Major versions of OpenSearch introduce breaking changes that require careful upgrades of the client. Check the [Compatibility](COMPATIBILITY.md) doc to see which version of the client should be used against your OpenSearch cluster.

### Upgrade to OpenSearch Ruby 4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any interface changes at all?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in in the API methods?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but looks like none?

lib/opensearch.rb Show resolved Hide resolved
lib/opensearch/api/actions/asynchronous_search/delete.rb Outdated Show resolved Hide resolved
lib/opensearch/transport/client.rb Show resolved Hide resolved
lib/opensearch/transport/client.rb Show resolved Hide resolved
@nhtruong
Copy link
Collaborator Author

@dblock

I am good with this with whatever revisions you want to make, I'd say my only must have is the need to have to duplicate args before .delete.

Similar to #261 (comment)

One of a future improvements would be to use keyword arguments instead of args = {}.

It's actually already using keyword arguments. You can do client.indices.delete(index: :books) where {index: books} is args. args = {} simply accounts for methods not having any required arguments.

I see a bunch of deleted specs. Do you feel like we don't need to generate tests that call these generated methods?

Those are legacy unit tests that weren't useful in the first place. This is the only client that has unit tests for every API method to make sure that an error is thrown when a required param is not provided or when an illegal query param is provided. And especially with generated API method, if we're also generating these specs, it's pretty much testing if 2 == 2 IMO. The integration specs I added already samples different generated methods to make sure that the required params and illegal params logic was generated correctly.

@nhtruong
Copy link
Collaborator Author

nhtruong commented Dec 18, 2024

Thanks for the feedback, everyone. Here are the changes I'm going to work on when I get back to this:

  • Remove activesupport and use native Ruby methods instead.
  • Renaming Utils methods for better clarify and more inline with ruby conventions
  • Possibly invert normalize_args to create a hash of symbols instead of strings (This could get hairy with new API methods that has non alphanumerical characters in their param names)

@nhtruong nhtruong marked this pull request as draft December 19, 2024 18:23
Signed-off-by: Theo Truong <[email protected]>
@nhtruong nhtruong marked this pull request as ready for review December 20, 2024 22:18
@nhtruong
Copy link
Collaborator Author

This is ready for review again.
@Earlopain I did not invert the normalize method to translate all keys into symbols to avoid to_s for existing user's code because to_s is going to be called on all keys anyway during the construction of the full url in the transport layer.

Copy link
Member

@dblock dblock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to merge this, let's cut a beta so we can try it out.

@@ -1,6 +1,10 @@
# Upgrading
Major versions of OpenSearch introduce breaking changes that require careful upgrades of the client. Check the [Compatibility](COMPATIBILITY.md) doc to see which version of the client should be used against your OpenSearch cluster.

### Upgrade to OpenSearch Ruby 4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but looks like none?

@@ -0,0 +1,16 @@
# SPDX-License-Identifier: Apache-2.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be capitalized Gemfile that's the standard ;)

@dblock dblock merged commit 9f933d5 into opensearch-project:main Jan 8, 2025
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants