Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce extensions for the Elasticsearch clients #10745

Merged
merged 5 commits into from
Jul 22, 2020

Conversation

gsmet
Copy link
Member

@gsmet gsmet commented Jul 15, 2020

@loicmathieu this is a continuation of your PR here #8290 with cleaner commits and some additional changes:

  • I fixed a few doc issues
  • I used the Feature class for the features
  • I cleaned up the package names for more consistency. Keep in mind that non-API runtime classes should be in the runtime subpackage
  • I added a description for the visible runtime extensions as it will be visible at code.quarkus.io.
  • I changed the bean initialization a bit by avoiding the @PostConstruct pattern.

All in all, its all minor changes but I wanted to make a list so that you have them in mind when reviewing (and for the next extensions).

Could you have a look and check everything is OK with you?

I preferred opening another PR so that we still have yours handy in case you want to compare or I missed something important.

@boring-cyborg boring-cyborg bot added area/core area/dependencies Pull requests that update a dependency file area/documentation area/hibernate-search Hibernate Search area/infra-automation anything related to CI, bots, etc. that are used to automated our infrastructure labels Jul 15, 2020
@gsmet gsmet requested a review from loicmathieu July 15, 2020 11:46
@gsmet
Copy link
Member Author

gsmet commented Jul 15, 2020

@yrodiere I think it's time for us to get your opinion on this. Thanks!

@gsmet gsmet added this to the 1.7.0 - master milestone Jul 15, 2020
Copy link
Member

@yrodiere yrodiere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few comments here and there regarding missing features or potential bugs, but it looks good overall.

If we want the Hibernate Search Elasticsearch extension to use the same REST client, I believe we'll have to handle multi-client configurations first. Right now only one client can be defined, but Hibernate Search allows multiple backends, each using a different client targeting a different cluster.
Once we have that, I'll have to change Hibernate Search to allow Quarkus to provide its own client. I'd be in favor of moving all the client configuration over to your Elasticsearch REST client extension, and removing it from the Hibernate Search extension; that way, configuration would be more similar to Hibernate ORM and its datasources, and I believe it would be more true to Quarkus' "opinionated" spirit.

There's one more blocker for Hibernate Search to use this extension, though: AWS support. People who want to connect to AWS Elasticsearch Service need to sign their requests, which means they need to plug in custom request processors, which means they need access to the HTTP client builder at some point.
In Hibernate Search, we provide a dedicated extension that relies on an internal extension point. You'll probably need something similar (at least the extension point) if you want to allow people to use this client with AWS Elasticsearch Service. If you decide to go a step further and provide built-in support for AWS request signing, let's discuss it first, as I plan to rework what we do in Hibernate Search, too.


static RestClientBuilder createRestClientBuilder(ElasticsearchConfig config) {
List<HttpHost> hosts = config.hosts.stream().map(s -> new HttpHost(s.substring(0, s.indexOf(":")),
Integer.parseInt(s.substring(s.indexOf(":") + 1)), config.protocol)).collect(Collectors.toList());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the port be optional and default to 9200?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure tbh but we should at least change the config stuff to use InetSocketAddress IMHO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mongodb, hibernate-search-elasticsearch and neo4j uses such a construct so I think it's OK here.

We can of course make the port optional but it's really standard to define elasticsearch URL this way so I wouldn't bother as it will implies a lot more branches in the configuration code.

builder.setHttpClientConfigCallback(new RestClientBuilder.HttpClientConfigCallback() {
@Override
public HttpAsyncClientBuilder customizeHttpClient(HttpAsyncClientBuilder httpClientBuilder) {
if (config.username.isPresent()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Hibernate Search, we issue a warning in this case if the protocol is set to http (the default), because then the password will be transmitted using a clear-text connection, which is a security issue on public networks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea.

*/
@ConfigItem
public Optional<Integer> ioThreadCounts;

Copy link
Member

@yrodiere yrodiere Jul 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Connections settings, which I think are quite important, are missing:

  • max_connections: the maximum number of connections to the cluster.
  • max_connections_per_route: the maximum number of connections to each node in the cluster.

See https://github.com/hibernate/hibernate-search/blob/9da99d7de8f346aa0ec81ae2ce0a1fa2c1657b2a/backend/elasticsearch/src/main/java/org/hibernate/search/backend/elasticsearch/client/impl/ElasticsearchClientFactoryImpl.java#L201-L200

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's because of https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/_others.html . They basically send you to the Apache HTTP client doc.

I would trust Yoann on that one :).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I'm following. The settings I mentioned are settings of the underlying HTTP client, not of the Elasticsearch Rest client. So indeed, they don't have dedicated entries in the Elasticsearch Rest client documentation. They are still relevant, though. See https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/_others.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I understand it now, it's different things ;)

IOReactorConfig ioReactorConfig = IOReactorConfig.custom()
.setIoThreadCount(config.ioThreadCounts.get())
.build();
httpClientBuilder.setDefaultIOReactorConfig(ioReactorConfig);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a ThreadProvider available yet inside the extension so I would not implement this now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that's something specific to Hibernate Search. You can implement a much simpler ThreadFactory that simply creates threads with an appropriate name. For example like this: https://github.com/hibernate/hibernate-search/blob/9da99d7de8f346aa0ec81ae2ce0a1fa2c1657b2a/legacy/engine/src/main/java/org/hibernate/search/util/impl/SearchThreadFactory.java

That is, if you consider thread names are important.

.build();
httpClientBuilder.setDefaultIOReactorConfig(ioReactorConfig);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't seem to be using the "protocol" setting. You should use it here to disable SSL if it's not necessary, because SSL causes significant slowdowns on startup.

See https://github.com/hibernate/hibernate-search/blob/9da99d7de8f346aa0ec81ae2ce0a1fa2c1657b2a/backend/elasticsearch/src/main/java/org/hibernate/search/backend/elasticsearch/client/impl/ElasticsearchClientFactoryImpl.java#L204-L208

static Sniffer createSniffer(RestClient client, ElasticsearchConfig config) {
return Sniffer.builder(client)
.setSniffIntervalMillis((int) config.discovery.refreshInterval.toMillis())
.build();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public RequestConfig.Builder customizeRequestConfig(RequestConfig.Builder requestConfigBuilder) {
return requestConfigBuilder
.setConnectTimeout((int) config.connectionTimeout.toMillis())
.setSocketTimeout((int) config.socketTimeout.toMillis());
Copy link
Member

@yrodiere yrodiere Jul 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it's still relevant nowadays, but you might want to call setConnectionRequestTimeout( 0 ) here to avoid requests being flagged as timed out even when they didn't time out.

See https://hibernate.atlassian.net/browse/HSEARCH-2681
See https://github.com/hibernate/hibernate-search/blob/9da99d7de8f346aa0ec81ae2ce0a1fa2c1657b2a/backend/elasticsearch/src/main/java/org/hibernate/search/backend/elasticsearch/client/impl/ElasticsearchClientFactoryImpl.java#L236

Copy link
Contributor

@loicmathieu loicmathieu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I added some minor comments but as it's mainly what I implemented in my original PR aprove it ;)

To answer you bullet list :

I fixed a few doc issues

As always ;) thanks for this!

I used the Feature class for the features

See my comment, I think it was disadvised now as it is complex to maintain and don't provides any advantage.

I cleaned up the package names for more consistency. Keep in mind that non-API runtime classes should be in the runtime subpackage

Yep, thanks

I added a description for the visible runtime extensions as it will be visible at code.quarkus.io.

I always forgot about this! Thanks

I changed the bean initialization a bit by avoiding the @PostConstruct pattern.

In the begining, I didn't use the Singleton scope so I eagerly create the client in a PostConstruct method. Now that is uses Singleton it's more simple like this.

Comment on lines +26 to +28
ELASTICSEARCH_REST_CLIENT_COMMON,
ELASTICSEARCH_REST_CLIENT,
ELASTICSEARCH_REST_HIGH_LEVEL_CLIENT,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought enumeration of the features inside th Feature class was something we want to avoid for now.
I remember some discussion about this last time I created an extension ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, well, no :).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure Alexey or David didn't agree with you but OK :)

@gsmet gsmet force-pushed the elasticsearch-clients-gsmet branch from c02eb2b to c107b74 Compare July 15, 2020 14:40
@gsmet
Copy link
Member Author

gsmet commented Jul 15, 2020

@loicmathieu some of Yoann's point can be dealt with in a followup but I think some of the issues need to be addressed before merging, especially the potential injection issue, the SSL not being fully included the password.get() and a few others.

The more cosmetic ones, I would say we can merge a first iteration and then build on it.

It would be nice if you could use fixup commits attached to the appropriate commit with git commit --fixup <sha> and then I will autosquash at the end.

It should be pretty easy as the commits are well split.

@gsmet
Copy link
Member Author

gsmet commented Jul 15, 2020

@loicmathieu I pushed the easy suggestions from Yoann, I let you play add the fixup commits to my branch? You should be authorized to. Ping me if not!

@loicmathieu
Copy link
Contributor

@loicmathieu some of Yoann's point can be dealt with in a followup but I think some of the issues need to be addressed before merging, especially the potential injection issue, the SSL not being fully included the password.get() and a few others.

OK, will do it. For the injection issue this is quickstart code is it really important ? I can use Vert.x JsonObject I think it's available everywhere but it will need Jackson. Or I can juts add a comment // never do this as it is a query injection ....

@gsmet
Copy link
Member Author

gsmet commented Jul 15, 2020

AFAIK you are already using Jackson in the doc so that would be fine.

@gsmet gsmet force-pushed the elasticsearch-clients-gsmet branch from d19542b to 3484146 Compare July 21, 2020 15:33
@@ -16,7 +17,7 @@
* The list of hosts of the Elasticsearch servers.
*/
@ConfigItem(defaultValue = "localhost:9200")
public List<String> hosts;
public List<InetSocketAddress> hosts;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the javadoc:

 * This class implements an IP Socket Address (IP address + port number)
 * It can also be a pair (hostname + port number), in which case an attempt
 * will be made to resolve the hostname.

Do you really want that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use InetSocketAddress in Quarkus for all the host:port combinations. I think, if I use getHostString() as I did, it does not resolve anything. At least that's what the Javadoc says.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked (statically) what java.net.InetSocketAddress#InetSocketAddress(java.lang.String, int) calls and ended up in java.net.InetAddress#getAddressesFromNameService... Might be worth a double-check.

loicmathieu and others added 5 commits July 21, 2020 18:16
It will be used by both the Hibernate Search - Elasticsearch extension
and the Elasticsearch client extensions.

Co-authored-by: Guillaume Smet <[email protected]>
@gsmet gsmet force-pushed the elasticsearch-clients-gsmet branch from 5243831 to ecf34a6 Compare July 21, 2020 16:17
@gsmet gsmet merged commit 498f1f0 into quarkusio:master Jul 22, 2020
@gsmet
Copy link
Member Author

gsmet commented Jul 22, 2020

Merged, thanks everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/core area/dependencies Pull requests that update a dependency file area/documentation area/hibernate-search Hibernate Search area/infra-automation anything related to CI, bots, etc. that are used to automated our infrastructure release/noteworthy-feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants