Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch nrslv resp #337

Merged
merged 48 commits into from
Jan 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
ce35cd3
fix response from nn
YaphetKG Dec 13, 2023
149be9f
norm returned values from make_request
YaphetKG Dec 13, 2023
24b34e9
correcting jsonable to recursively serialize sub objects
YaphetKG Dec 19, 2023
785b789
correcting jsonable to recursively serialize sub objects
YaphetKG Dec 19, 2023
fec990a
correcting jsonable to recursively serialize sub objects
YaphetKG Dec 19, 2023
edfff4f
correcting jsonable to recursively serialize sub objects
YaphetKG Dec 19, 2023
f01844a
parameterize all identifier inner vars;
YaphetKG Dec 19, 2023
c70940f
parameterize everything for init from json form
YaphetKG Dec 19, 2023
a95bd2e
probably not a revealation but making everything optional in initial…
YaphetKG Dec 19, 2023
f3fca0f
probably not a revealation but making everything optional in initial…
YaphetKG Dec 19, 2023
1bd901f
missed description
YaphetKG Dec 19, 2023
3f4e334
normalize search test in identifier
YaphetKG Dec 19, 2023
227ad4a
https://github.com/TranslatorSRI/NameResolution/issues/129
YaphetKG Dec 20, 2023
5888094
avoid deep copy
YaphetKG Dec 20, 2023
527fbb8
see if this helps
YaphetKG Dec 20, 2023
17893a0
shallow copy and dump
YaphetKG Dec 20, 2023
1aa475f
logging for crawler
YaphetKG Dec 20, 2023
b5405eb
reverting cause of memory leak
YaphetKG Dec 21, 2023
f1950e0
debug message for tranql
YaphetKG Dec 21, 2023
7afd258
Merge branch 'develop' into patch-nrslv-resp
YaphetKG Jan 3, 2024
9ae8e0a
remove annotate commented out code, backdrop python min requriement
YaphetKG Jan 4, 2024
a18670e
feat: updated elasticsearch auth protocol to latest version
braswent Jan 4, 2024
4c4977d
feat: change annotator config to allow for different configs
braswent Jan 4, 2024
4eb6d2e
pass down config , no global access
YaphetKG Jan 4, 2024
0147fae
remove `-` from annotator names
YaphetKG Jan 4, 2024
80e35ae
normalize args for sapbert so it becomes easier parsing from env
YaphetKG Jan 4, 2024
096ba47
Sorted lists for json serialization for parser and annotator outputs
mbacon-renci Jan 11, 2024
0b7b51f
Reverted jsonable, sorted lists on assignment and change, rather than…
mbacon-renci Jan 16, 2024
0bb7085
Trying bumps in Docker base images
mbacon-renci Jan 17, 2024
ef0b74d
Adding jsonpickle to requirements.txt
mbacon-renci Jan 17, 2024
ebf9078
Moving required python version back to 3.11.
mbacon-renci Jan 22, 2024
56b85df
Changing image back to 3.11 as well
mbacon-renci Jan 22, 2024
8834423
Backing up redis image change to see if I can get dug auto-build to w…
mbacon-renci Jan 22, 2024
022f698
Build all branches for testing, pushing only to docker. Fix tag bypas…
joshua-seals Jan 23, 2024
ef8b721
Testing alpine to fix trivy error
joshua-seals Jan 23, 2024
e16a347
Vuln confirmed in image, new docker image test
joshua-seals Jan 23, 2024
5be0195
Is buildcache causing trivy failures?
joshua-seals Jan 23, 2024
d17578d
Re-enabling cache after testing
joshua-seals Jan 23, 2024
d1ff3c9
Revert to older trivy relelase
joshua-seals Jan 23, 2024
96f7338
trivy scan update
joshua-seals Jan 23, 2024
5bee00d
adding pytest asyncio
YaphetKG Jan 24, 2024
9cb89ca
fix tests
YaphetKG Jan 24, 2024
64f3cb6
fix annotator init
YaphetKG Jan 24, 2024
15cccfe
fix all the tests
YaphetKG Jan 24, 2024
f3d9411
Forced Python 3.11
mbacon-renci Jan 24, 2024
d7257df
bump docker image version to 0 vuls
YaphetKG Jan 24, 2024
92cec85
Merge branch 'sort_pickle_lists' into patch-nrslv-resp
YaphetKG Jan 24, 2024
275abcb
zero again 0_o
YaphetKG Jan 24, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/build-push-release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ on:
- .dockerignore
- .githooks
tags-ignore:
- 'v[0-9]+.[0-9]+.*'
- '*'
jobs:
build-push-release:
runs-on: ubuntu-latest
Expand Down
83 changes: 44 additions & 39 deletions .github/workflows/code-checks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -66,45 +66,6 @@ jobs:
# flake8 --ignore=E,W --exit-zero .
continue-on-error: true

# ############################## build-vuln-test ##############################
# build-vuln-test:
# # needs: flake8-linter
# runs-on: ubuntu-latest
# steps:
# - uses: actions/checkout@v3

# - name: Set up Docker Buildx
# uses: docker/setup-buildx-action@v3
# with:
# driver-opts: |
# network=host

# - name: Login to DockerHub
# uses: docker/login-action@v3
# with:
# username: ${{ secrets.DOCKERHUB_USERNAME }}
# password: ${{ secrets.DOCKERHUB_TOKEN }}
# logout: true

# # Notes on Cache:
# # https://docs.docker.com/build/ci/github-actions/examples/#inline-cache
# - name: Build Container
# uses: docker/build-push-action@v5
# with:
# context: .
# push: false
# load: true
# tag: ${{ github.repository }}:vuln-test
# cache-from: type=registry,ref=${{ github.repository }}:buildcache
# cache-to: type=registry,ref=${{ github.repository }}:buildcache,mode=max
# ####### Run for Fidelity ######
# - name: Run Trivy vulnerability scanner
# uses: aquasecurity/trivy-action@master
# with:
# image-ref: '${{ github.repository }}:vuln-test'
# severity: 'CRITICAL,HIGH'
# exit-code: '1'

################################### PYTEST ###################################
pytest:
runs-on: ubuntu-latest
Expand Down Expand Up @@ -145,3 +106,47 @@ jobs:
- name: Test with Bandit
run: |
bandit -r src -n3 -lll

############################## test-image-build ##############################
test-image-build:
runs-on: ubuntu-latest
# if: ${{ github.actor == 'dependabot[bot]' }}
steps:
- uses: actions/checkout@v3

- name: Set short git commit SHA
id: vars
run: |
echo "short_sha=$(git rev-parse --short ${{ github.sha }})" >> $GITHUB_OUTPUT
# https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/

- name: Confirm git commit SHA output
run: echo ${{ steps.vars.outputs.short_sha }}

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3

- name: Login to DockerHub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
logout: true

- name: Parse Github Reference Name
id: branch
run: |
REF=${{ github.ref_name }}
echo "GHR=${REF%/*}" >> $GITHUB_OUTPUT

# Notes on Cache:
# https://docs.docker.com/build/ci/github-actions/examples/#inline-cache
- name: Build Container
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: |
${{ github.repository }}:test_${{ steps.branch.outputs.GHR }}
cache-from: type=registry,ref=${{ github.repository }}:buildcache
cache-to: type=registry,ref=${{ github.repository }}:buildcache,mode=max
10 changes: 7 additions & 3 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,15 @@
# A container for the core semantic-search capability.
#
######################################################
FROM python:3.12.0-alpine3.18
FROM python:3.12.1-alpine3.19


# Install required packages
RUN apk update && \
apk add g++ make
apk add g++ make

#upgrade openssl \
RUN apk add openssl=3.1.4-r4

RUN pip install --upgrade pip
# Create a non-root user.
Expand All @@ -31,4 +35,4 @@ RUN make install
RUN make install.dug

# Run it
ENTRYPOINT dug
ENTRYPOINT dug
2 changes: 1 addition & 1 deletion docker-compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ services:
##
#################################################################################
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:8.5.2
image: docker.elastic.co/elasticsearch/elasticsearch:8.11.3
networks:
- dug-network
environment:
Expand Down
4 changes: 3 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,15 @@ elasticsearch[async]==8.5.2
gunicorn
itsdangerous
Jinja2
jsonpickle
jsonschema
MarkupSafe
ormar
mistune
pluggy
pyrsistent
pytest
pytest-asyncio
pytz
PyYAML
requests
Expand All @@ -26,4 +28,4 @@ click
httpx
linkml-runtime==1.6.0
bmt==1.1.0
urllib3
urllib3
2 changes: 1 addition & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ classifiers =
package_dir =
= src
packages = find:
python_requires = >=3.12
python_requires = >=3.10
include_package_data = true
install_requires =
elasticsearch==8.5.2
Expand Down
2 changes: 1 addition & 1 deletion src/dug/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ def get_argparser():
'-a', '--annotator',
help='Annotator used to annotate identifiers in crawl file',
dest="annotator_type",
default="annotator-monarch"
default="monarch"
)

crawl_parser.add_argument(
Expand Down
137 changes: 83 additions & 54 deletions src/dug/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,9 @@
@dataclass
class Config:
"""
TODO: Populate description
TODO: Populate description
"""

elastic_password: str = "changeme"
redis_password: str = "changeme"

Expand All @@ -27,74 +28,102 @@ class Config:
nboost_port: int = 8000

# Preprocessor config that will be passed to annotate.Preprocessor constructor
preprocessor: dict = field(default_factory=lambda: {
"debreviator": {
"BMI": "body mass index"
},
"stopwords": ["the"]
})

preprocessor: dict = field(
default_factory=lambda: {
"debreviator": {"BMI": "body mass index"},
"stopwords": ["the"],
}
)
annotator_type: str = "monarch"
# Annotator config that will be passed to annotate.Annotator constructor
annotator: dict = field(default_factory=lambda: {
"url": "https://api.monarchinitiative.org/api/nlp/annotate/entities?min_length=4&longest_only=false&include_abbreviation=false&include_acronym=false&include_numbers=false&content="
})
annotator_args: dict = field(
default_factory=lambda: {
"monarch": {
"url": "https://api.monarchinitiative.org/api/nlp/annotate/entities?min_length=4&longest_only=false&include_abbreviation=false&include_acronym=false&include_numbers=false&content="
},
"sapbert": {
"classification_url": "https://med-nemo.apps.renci.org/annotate/",
"annotator_url": "https://babel-sapbert.apps.renci.org/annotate/",
},
}
)

# Normalizer config that will be passed to annotate.Normalizer constructor
normalizer: dict = field(default_factory=lambda: {
"url": "https://nodenormalization-dev.apps.renci.org/get_normalized_nodes?conflate=false&description=true&curie="
})
normalizer: dict = field(
default_factory=lambda: {
"url": "https://nodenormalization-dev.apps.renci.org/get_normalized_nodes?conflate=false&description=true&curie="
}
)

# Synonym service config that will be passed to annotate.SynonymHelper constructor
synonym_service: dict = field(default_factory=lambda: {
"url": "https://name-resolution-sri.renci.org/reverse_lookup"
})
synonym_service: dict = field(
default_factory=lambda: {
"url": "https://name-resolution-sri.renci.org/reverse_lookup"
}
)

# Ontology metadata helper config that will be passed to annotate.OntologyHelper constructor
ontology_helper: dict = field(default_factory=lambda: {
"url": "https://api.monarchinitiative.org/api/bioentity/"
})
ontology_helper: dict = field(
default_factory=lambda: {
"url": "https://api.monarchinitiative.org/api/bioentity/"
}
)

# Redlist of identifiers not to expand via TranQL
tranql_exclude_identifiers: list = field(default_factory=lambda: ["CHEBI:17336"])

tranql_queries: dict = field(default_factory=lambda: {
"disease": ["disease", "phenotypic_feature"],
"pheno": ["phenotypic_feature", "disease"],
"anat": ["disease", "anatomical_entity"],
"chem_to_disease": ["chemical_entity", "disease"],
"small_molecule_to_disease": ["small_molecule", "disease"],
"chemical_mixture_to_disease": ["chemical_mixture", "disease"],
"phen_to_anat": ["phenotypic_feature", "anatomical_entity"],
})

node_to_element_queries: dict = field(default_factory=lambda: {
# Dug element type to cast the query kg nodes to
"cde": {
# Parse nodes matching criteria in kg
"node_type": "biolink:Publication",
"curie_prefix": "HEALCDE",
# list of attributes that are lists to be casted to strings
"list_field_choose_first": [
"files"
],
"attribute_mapping": {
# "DugElement Attribute" : "KG Node attribute"
"name": "name",
"desc": "summary",
"collection_name": "cde_category",
"collection_id": "cde_category",
"action": "files"
tranql_queries: dict = field(
default_factory=lambda: {
"disease": ["disease", "phenotypic_feature"],
"pheno": ["phenotypic_feature", "disease"],
"anat": ["disease", "anatomical_entity"],
"chem_to_disease": ["chemical_entity", "disease"],
"small_molecule_to_disease": ["small_molecule", "disease"],
"chemical_mixture_to_disease": ["chemical_mixture", "disease"],
"phen_to_anat": ["phenotypic_feature", "anatomical_entity"],
}
)

node_to_element_queries: dict = field(
default_factory=lambda: {
# Dug element type to cast the query kg nodes to
"cde": {
# Parse nodes matching criteria in kg
"node_type": "biolink:Publication",
"curie_prefix": "HEALCDE",
# list of attributes that are lists to be casted to strings
"list_field_choose_first": ["files"],
"attribute_mapping": {
# "DugElement Attribute" : "KG Node attribute"
"name": "name",
"desc": "summary",
"collection_name": "cde_category",
"collection_id": "cde_category",
"action": "files",
},
}
}
})
)

concept_expander: dict = field(default_factory=lambda: {
"url": "https://tranql-dev.renci.org/tranql/query?dynamic_id_resolution=true&asynchronous=false",
"min_tranql_score": 0.0
})
concept_expander: dict = field(
default_factory=lambda: {
"url": "https://tranql-dev.renci.org/tranql/query?dynamic_id_resolution=true&asynchronous=false",
"min_tranql_score": 0.0,
}
)

# List of ontology types that can be used even if they fail normalization
ontology_greenlist: list = field(default_factory=lambda: ["PATO", "CHEBI", "MONDO", "UBERON", "HP", "MESH", "UMLS"])
ontology_greenlist: list = field(
default_factory=lambda: [
"PATO",
"CHEBI",
"MONDO",
"UBERON",
"HP",
"MESH",
"UMLS",
]
)

@classmethod
def from_env(cls):
Expand All @@ -107,7 +136,7 @@ def from_env(cls):
"elastic_password": "ELASTIC_PASSWORD",
"redis_host": "REDIS_HOST",
"redis_port": "REDIS_PORT",
"redis_password": "REDIS_PASSWORD"
"redis_password": "REDIS_PASSWORD",
}

kwargs = {}
Expand Down
2 changes: 1 addition & 1 deletion src/dug/core/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ def crawl(self, target_name: str, parser_type: str, annotator_type: str, element

pm = get_plugin_manager()
parser = get_parser(pm.hook, parser_type)
annotator = get_annotator(pm.hook, annotator_type)
annotator = get_annotator(pm.hook, annotator_type, self._factory.config)
targets = get_targets(target_name)

for target in targets:
Expand Down
Loading
Loading