-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1 #97
Draft
cbizon
wants to merge
2
commits into
master
Choose a base branch
from
refactor
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
v1 #97
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
# Performance Refactor Proposal | ||
|
||
This document describes the proposed refactoring of Translator at the start of the Performance phase at a high level. This does not describe all details of the refactoring, which will be jointly developed over time through discussions and decisions in the relevant working groups. | ||
The goal of this document is to establish a broad agreed-upon framework into which those details will fit, and codify the discussions at the February 2025 Relay. | ||
|
||
The plan to to complete the refactoring by Feb 28, 2026. | ||
|
||
## ARAs to Shepherd | ||
|
||
* We will create a shared platform for ARA implementation | ||
* The name of this platform is Shepherd | ||
* The workflow core of Shepherd will be implemented in python | ||
* Individual Shepherd operations may vary in their implementation language or technology, though python is preferred | ||
* Shepherd will be implemented as a workflow engine with shared operation components | ||
* Technical decisions related to the implementation of workflows are explicitly outside the scope of this document | ||
* ARAs will maintain individual TRAPI endpoints addressable at individual URLs, even if the underlying implementation is shared. | ||
* Best effort will be made to reimplement ARAGORN, BTE, and ARAX to Shepherd. | ||
* ARS will continue to exist and does not require modifications based on the outlined plan for Shepherd. | ||
* It is possible to host non-Shepherd ARAs if desired and this PR does not exclude the possiblity. | ||
* Shepherd will access Translator knowledge providers via the Retriever interface. | ||
* ARAs can access ARA-specific data sources such as databases directly. | ||
|
||
## KPS to Retriever / DogPark | ||
|
||
* Translator will implement a common data platform called DogPark. | ||
* The details of DogPark implementation including data architecture, database implementation, tiers, or interfaces are outside the scope of this document. | ||
* DogPark will host all Translator Knowledge Providers (KPs), including providing access to external APIs. | ||
* Translator will implement a query interface to DogPark called Retriever. | ||
* Retriever will implement at least an async TRAPI interface, and may develop further interfaces in the future. | ||
* The TRAPI Interface will respond to lookup queries, and will implement all Phase 2 requirements of TRAPI KPs such as subclass inference, canonical edge directions, and others as described in the TranslatorArchitecture document. | ||
* The detailed implementation of Retriever is outside the scope of this document. | ||
* Translator will coordinate knowledge ingests to reduce redundancy and modeling variation. | ||
* Translator will move towards a common declarative ingest pipeline. | ||
* Some ingests will require non-declarative ingests or generate new knowledge via analysis, which will require special handling outside of the common ingest pipeline. | ||
* Data hosted in DogPark will be pre-normalized to the extent possible, but this document does not specify the details of normalization, which will be agreed upon at a later date. | ||
|
||
## Node Components | ||
|
||
* Several components do not contain edges, but only nodes. At present, these components include (but are not necessarily limited to) SRI Node Normalizer, KG2 Node Synonymizer, Node Annotator, and Name Resolver. | ||
* These components will agree on a common set of identifier equivalences to be used across all tools, and in the normalization of DogPark knowledge. | ||
* Node Normalizer will implement refactorings to reduce its cost and may reduce performance to acheive this goal | ||
* Node Normalizer will however still be used at query time to provide canonical identifiers for external API sources. | ||
* Translator will investiget the merger of any or all of these tools based upon their use in the proposed architecture, but this document does not specify any decision. |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something that David K brought up, but Shepherd will default to using the Retriever interface for any lookups (and we will strongly recommend this interface), but it's entirely possible that an ARA is able to access other knowledge providers or services outside of Retriever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @maximusunc I added a new line that ARAs can access their own internal stuff like databases. @dkoslicki do you think that this captures it or should it be more expansive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The databases part makes sense. And the
It is possible to host non-Shepherd ARAs if desired and this PR does not exclude the possibility.
also leaves open the possibility of using an ARAX-style beam-search for lookups instead of retriever directly, like we talked about on the call. So both of these capture it in my mind.