-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Discuss] Core Foundation Code Refactoring #5910
Comments
I think this is in line with the work started in #1838. Tagging it here for reference. |
I think the proposed refactor is a huge undertaking. I agree directionally that it's a good thing, however we are still refactoring a massive monolith which seems like a ton of work to create extension points all while cutting out deep dependencies. Have you considered an alternate strategy where we'd expose new interfaces on top of the current mess, export them into the SDK, build new and existing functionality as clean extensions, fold the default implementation (as an extension) back in the distribution, and finally delete the old implementation? If we do this enough times we'd end up with your idea of motherboard. (Note that the SDK is pulling server as a dependency is a shortcut, and either way it will only do so at build time and not at runtime, if I am not mistaken - @saratvemulapalli can confirm). |
Of course it is. We can do this incrementally without shimming...
No because shimming will create intermediate interfaces we ultimately may not want exposed in the sdk. Incrementally refactoring core to the proper libraries not only gives us control within this repository but enables us the ability to conservatively hold classes back as |
Thanks @nknize. My concern is that we continue to overly rely on internal (plugin) interfaces where we shouldn't. @saratvemulapalli brought this up recently around the efforts of moving security into core, for example. |
Let's call that out on a case by case basis where it continues to occur. Chances are we inherited this and are trying to move away from it. The more I get into the throws of reorganizing the house the more I lean into the java plugin sdk being a |
The extensions SDK is https://github.com/opensearch-project/opensearch-sdk-java, and I think it's a good idea if it only relies on whatever is in |
I updated the issue with the following definitions:
|
Thanks @nknize for opening this up. I like it!! +1 to |
Coming out of #6875 (comment) Objective: Eliminate downstream dependencies on Option 1: Refactor OpenSearch foundation classes to Option 2: Refactor OpenSearch foundation classes that do not have third-party dependencies to |
@nknize Just to make sure I fully understand the objective, concretely this will mean all |
Not answering the question, but this is my understanding as per #6875 (comment) |
Correct!
💯 that's the approach I like...progress toward splitting these into their proper home. It may not be perfect, but it helps lean out the dependencies and gives us some clear understanding where all of the modules live. I put this rough diagram together to capture the spirit of the modularity vision. The reality is most of the concrete code lives in :server... 😞 I'm trying to split that out to get us closer to this extensible future so we really can start looking at alternative replication models, or translog implementations. |
@dbwiddis , @owaiskazi19 thoughts? |
Is your feature request related to a problem? Please describe.
Elasticsearch never supported a third-party plugin ecosystem. Elasticsearch plugins were implemented as a mechanism to enable internal Elastic developers the ability to quickly write and release independent and decoupled features across a growing startup organization but within a monolithic repo. Third party developers (e.g., ODFE, independent developers) exploited this plugin framework to write custom "untrusted" / "second class" plugins for elasticsearch in separate multi-repositories. When OpenSearch forked, we inherited this disorganized and poorly documented set of plugin interfaces, and quickly (some might say, prematurely) marketed it for downstream usage with strict "semver" guarantees (that actually aren't enforced).
To fix this, OpenSearch is striving for (well supported) modularity, in the same spirit, to enable rapid development of new decoupled features (e.g., replication strategies, decoupling translog, streaming index api).
OpenSearch is also striving for "extensibility" through mechanisms like opensearch-sdk to enable community and third-party development / contribution of plugins, modules, and libraries.
The problem, however, is multi-fold. For instance, transitive / bloated dependencies that are not well encapsulated. The opensearch-sdk, along with countless other plugins, pull server as a dependency which transitively depends on 29 other libraries whether an extension / plugin, needs them or not. This exposes the entire OpenSearch internals to downstream plugins giving the ability to do potentially (most likely unknowingly) malicious or dangerous things.
Additionally, as core is being developed, there is no clear definition of what foundation or concrete classes should go where (e.g.,
:server
vs:module
vs:plugin
vs:libs
). For example,DocValueFormat
is an interface inside the:server
modulesearch
namespace used to define custom DocValueFormatters. It would be useful if this were in theopensearch-core
lib so a plugin wouldn't have to depend on the entire server jar just to pull in dependency foundation interfaces do define their own DocValueFormatter.Finally, and one of the most important justifications for this refactor effort, we all know JSM is deprecated with an API targeted to be replaced with noops sometime in the future. There is no drop in replacement mostly because java has moved away from the world of client side applets into the world of server side trusted code. Shallow sandboxing is the new model. Thus, the interim recommendation is to leverage the "new" language design features (since Java 9) of JEP 403 and Project Jigsaw (JPMS). OpenSearch is unable to use these design features until we eliminate split-package across the core namespaces (e.g., o.o.common, o.o.bootstrap).
At present the code organization is as follows:
libs/
- foundation classes that can be used in any ofmodules/
,plugins/
,server/
,qa/
,sandbox/
,test/
. (e.g.,cli
,core
,nio
,secure-sm
,x-content
, ...).modules/
- "first class" features and capabilities considered part of "the core" / "min distribution". Installed by default. (e.g.,geo
,ingest-common
,lang-painless
,percolator
,reindex
, ...)plugins/
- "first class" features and capabilities not considered part of the "the core" / "min distribution". Not installed by default. (e.g.,analysis-icu
,ingest-attachment
,repository-s3
,store-smb
, ...)server/
- "the core" implementation, architected into logical namespaces (e.g.,index
,ingest
,repositories
,search
,snapshots
, ...)Due to the tightly coupled nature of the plugins and module plugins to the
:server
module big feature development efforts (e.g., replication strategies, decoupling translog, streaming index api) spend a significant amount of time unwinding foundation:server
classes from concrete implementation details just to extend them for new use cases (e.g., local vs remote storage). The unwinding typically leads to more foundation classes remaining in the:server
module only because there is no formal distinction of where these foundation classes should go. This problem exacerbated by the limitation that the build is configured such that library modules may not depend on other library modules exceptopensearch-core
. This heavily restricts opensearch from using libraries to trim down the bloated:server
module (example provided below).Describe the solution you'd like
Decoupling foundation class and interface contracts from the
:server
module into appropriate separable JPMS enabled libraries as defined below::libs:opensearch-common
- Reusable java components only (e.g., collections, generic data structures). This library is dependency free. It may not depend on any other opensearch or third-party libraries unless it's needed forbuild-tools
only.:libs:opensearch-core
- The OpenSearch foundation classes, interface contracts / definitions, and base classes implemented by:server
and other plugins.:libs:opensearch-*
- Foundation libraries implemented by modules and plugins. May depend on other opensearch libraries but not on modules or plugins.:modules:*
- Plugins that are installed by default as part of the minimum distribution:plugins:*
- Plugins that are not installed by default but shipped as part of the min distributionopensearch-project/Plugin-* - External OpenSearch plugins that are not installed by default and shipped as part of the bundle
We will ensure that the core libraries (e.g.,
opensearch-core
) and concrete module implementations (e.g.,:modules:mapper-extras
) or library implementations (e.g.,:libs:x-content
) do not not have package split-brains that prevent using JPMS for strong encapsulation, and breaking SemVer guarantees.For clarity, here's an example where the cross library dependency limitation currently restricts a big part of modularization:
Aggregations
Third party libraries, plugins, modules, etc, want to build new aggregations all the time. We provide this ability and it's super powerful. To do so is quite boiler plate: 1. If the CoreValuesSourceType library doesn't already provide the requisite ValueSource, write the needed ValuesSourceType the new aggregation depends on, 2. write the Aggregator, 3. write the Aggregator Factory, 4. write the AggregationBuilder, 5. register the aggregation through your plugin using the AggregationSpec
As you can see in this process there is a large number of abstract classes and interfaces required to support implementing a new concrete aggregation anywhere in the "ecosystem". Organizationally, these base classes are spread across the aggregations and aggregations.support namespace, along with the mechanisms needed to implement the aggregation phase(s) in the SearchModule and SearchService class implementations. This means for anyone wanting to build a custom aggregation they would take a
build.gradle
dependency on"org.opensearch:opensearch:${opensearchVersion}"
which pulls in a bloated server jar containing much more than the classes and dependencies they need to build their aggregation.I think most on this project agree this is a modularization anti-pattern, so to remedy we're striving to make the codebase much more modular. So how do we achieve that here with this clumsy aggregation foundation and user facing "API" (which isn't exactly well documented, or implemented, or supported)? At first pass you might say, "hey, how about we stick with the intent of the libraries and modules design and refactor the generics and base classes from the bloated
:server
module to a new:opensearch-aggregations
library with the concrete implementation in a new:aggs-common
module that continue to use thesearch.aggregations.*
namespace? I'd agree with this, in fact I'd go a step further and say this is where I think we should go with, not just aggregations, most of the big opensearch mechanisms (field mappers, field types, doc value types, field data, etc). In the end the:server
module should be nothing more than the "motherboard" providing the concrete implementation of all the opensearch base components within the libraries and concrete extendable mechanisms provided by the modules and optional plugins.But if you embark down that journey you will quickly find the cross library dependency restriction blocks much of this objective and the tight dependency requirement of the
:server
module with other modules quickly leads to JarHell issues. Take the aggregation decoupling use case, for example. Aggregations, like field mappers, are serializable and therefore depend on XContent in order to implement the ToXContentFragment logic. So how about refactoring all the XContent base classes to the:opensearch-x-content
library then? That works until you refactor any base interfaces that depend on XContent to a library other thanopensearch-core
, because libraries cannot depend on each other (e.g., any new library cannot take a dependency on both core and x-content).So that leaves us with a few options:
opensearch-core
library and the implementation details to other libraries, modules, and pluginsWith this refactoring effort we chose Option 4 because it not only fits the modularization objective but after the aggregation framework is refactored to a new module (and supporting library) users in the aggregation scenario above would only need to take a dependency on the opensearch-core and a (soon coming) opensearch-aggregations library to implement a new concrete aggregation in their own plugin.
Describe alternatives you've considered
Additional context
relates #2447
Also see conversation in #5902 for additional discussion.
The text was updated successfully, but these errors were encountered: