Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1 #97

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft

v1 #97

Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions PerformanceRefactor.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Performance Refactor Proposal

This document describes the proposed refactoring of Translator at the start of the Performance phase at a high level. This does not describe all details of the refactoring, which will be jointly developed over time through discussions and decisions in the relevant working groups.
The goal of this document is to establish a broad agreed-upon framework into which those details will fit, and codify the discussions at the February 2025 Relay.

The plan to to complete the refactoring by Feb 28, 2026.

## ARAs to Shepherd

* We will create a shared platform for ARA implementation
* The name of this platform is Shepherd
* The workflow core of Shepherd will be implemented in python
* Individual Shepherd operations may vary in their implementation language or technology, though python is preferred
* Shepherd will be implemented as a workflow engine with shared operation components
* Technical decisions related to the implementation of workflows are explicitly outside the scope of this document
* ARAs will maintain individual TRAPI endpoints addressable at individual URLs, even if the underlying implementation is shared.
* Best effort will be made to reimplement ARAGORN, BTE, and ARAX to Shepherd.
* ARS will continue to exist and does not require modifications based on the outlined plan for Shepherd.
* It is possible to host non-Shepherd ARAs if desired and this PR does not exclude the possiblity.
* Shepherd will access Translator knowledge providers via the Retriever interface.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something that David K brought up, but Shepherd will default to using the Retriever interface for any lookups (and we will strongly recommend this interface), but it's entirely possible that an ARA is able to access other knowledge providers or services outside of Retriever.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @maximusunc I added a new line that ARAs can access their own internal stuff like databases. @dkoslicki do you think that this captures it or should it be more expansive?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The databases part makes sense. And the It is possible to host non-Shepherd ARAs if desired and this PR does not exclude the possibility. also leaves open the possibility of using an ARAX-style beam-search for lookups instead of retriever directly, like we talked about on the call. So both of these capture it in my mind.

* ARAs can access ARA-specific data sources such as databases directly.

## KPS to Retriever / DogPark

* Translator will implement a common data platform called DogPark.
* The details of DogPark implementation including data architecture, database implementation, tiers, or interfaces are outside the scope of this document.
* DogPark will host all Translator Knowledge Providers (KPs), including providing access to external APIs.
* Translator will implement a query interface to DogPark called Retriever.
* Retriever will implement at least an async TRAPI interface, and may develop further interfaces in the future.
* The TRAPI Interface will respond to lookup queries, and will implement all Phase 2 requirements of TRAPI KPs such as subclass inference, canonical edge directions, and others as described in the TranslatorArchitecture document.
* The detailed implementation of Retriever is outside the scope of this document.
* Translator will coordinate knowledge ingests to reduce redundancy and modeling variation.
* Translator will move towards a common declarative ingest pipeline.
* Some ingests will require non-declarative ingests or generate new knowledge via analysis, which will require special handling outside of the common ingest pipeline.
* Data hosted in DogPark will be pre-normalized to the extent possible, but this document does not specify the details of normalization, which will be agreed upon at a later date.

## Node Components

* Several components do not contain edges, but only nodes. At present, these components include (but are not necessarily limited to) SRI Node Normalizer, KG2 Node Synonymizer, Node Annotator, and Name Resolver.
* These components will agree on a common set of identifier equivalences to be used across all tools, and in the normalization of DogPark knowledge.
* Node Normalizer will implement refactorings to reduce its cost and may reduce performance to acheive this goal
* Node Normalizer will however still be used at query time to provide canonical identifiers for external API sources.
* Translator will investiget the merger of any or all of these tools based upon their use in the proposed architecture, but this document does not specify any decision.