Skip to content

Commit

Permalink
Feature/mx-1533 graph id provider (#24)
Browse files Browse the repository at this point in the history
# PR Context
this change allows us to stop re-creating models from mex-common in
mex-backend just to sidestep the
[set_identifiers](https://github.com/robert-koch-institut/mex-common/blob/0.19.1/mex/common/models/extracted_data.py#L73)
validator. now we have a graph-identity-provider that we can use in the
backend to assign and fetch identities directly from the database.

# Added
- graph identity provider that assigns ids to extracted data
- generalize type enums into DynamicStrEnum superclass
- seed primary source for mex on connector init
- test fixture that makes Identifiers deterministic

# Changes
- assign name to uniqueness constraint
- use graph identity provider in identity endpoints
- add module name to dynamic models for better debugging
- allow 'MergedThing' as well as 'Thing' as entityType query parameter

# Removed
- remove dynamic extracted model classes and use those from mex-common

# Fixed
- don't allow identifierInPrimarySource changes on node updates
  • Loading branch information
cutoffthetop authored Feb 19, 2024
1 parent 260a047 commit 15fc10e
Show file tree
Hide file tree
Showing 39 changed files with 1,124 additions and 804 deletions.
2 changes: 1 addition & 1 deletion .cruft.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"template": "https://github.com/robert-koch-institut/mex-template",
"commit": "13034bb01a8da263e669936438ca099ea4e56afb",
"commit": "d1a461de3c7ff099045b71a156ad667887c32368",
"checkout": null,
"context": {
"cookiecutter": {
Expand Down
9 changes: 8 additions & 1 deletion .github/dependabot.yml
Original file line number Diff line number Diff line change
@@ -1,9 +1,16 @@
version: 2
updates:
- package-ecosystem: "github-actions"
allow:
- dependency-type: "all"
directory: "/"
open-pull-requests-limit: 1
schedule:
interval: "monthly"
- package-ecosystem: "pip"
allow:
- dependency-type: "all"
directory: "/"
open-pull-requests-limit: 1
schedule:
interval: "weekly"
interval: "daily"
6 changes: 4 additions & 2 deletions .github/workflows/cookiecutter.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@ name: Cookiecutter

on:
push:
branches: ["main"]
pull_request:
schedule:
- cron: '0 0 * * *'
workflow_dispatch:

env:
Expand All @@ -11,7 +13,7 @@ env:
PIP_PREFER_BINARY: on

jobs:
lint:
cruft:
runs-on: ubuntu-latest
timeout-minutes: 10
steps:
Expand Down
8 changes: 7 additions & 1 deletion .github/workflows/cve-scan.yml
Original file line number Diff line number Diff line change
@@ -1,9 +1,15 @@
name: CVE Scan

on:
push:
# Workflows triggered by Dependabot on the "push" event run with read-only access.
# Uploading Code Scanning results requires write access. We therefore only use the
# "pull_request" trigger for Dependabot branches.
branches-ignore:
- 'dependabot/**'
pull_request:
schedule:
- cron: '0 2 * * *'
- cron: '0 0 * * *'
workflow_dispatch:

jobs:
Expand Down
3 changes: 1 addition & 2 deletions .github/workflows/docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,7 @@ name: Docker

on:
push:
tags:
- '**'
tags: ["**"]
workflow_dispatch:

jobs:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/documentation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ name: Documentation

on:
push:
branches: ["main"]
tags: ["**"]
workflow_dispatch:

env:
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/linting.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ name: Linting

on:
push:
pull_request:
workflow_dispatch:

env:
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/open-code.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ name: OpenCoDE
on:
push:
branches: ["main"]
tags: ["**"]
workflow_dispatch:

jobs:
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/testing.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ name: Testing

on:
push:
pull_request:
workflow_dispatch:

env:
Expand Down
4 changes: 2 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@ default_language_version:
python: python3.11
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.1.13
rev: v0.2.0
hooks:
- id: ruff
args: [--fix, --exit-non-zero-on-fix]
- repo: https://github.com/psf/black
rev: 23.12.1
rev: 24.2.0
hooks:
- id: black
- repo: https://github.com/pre-commit/pre-commit-hooks
Expand Down
12 changes: 12 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,15 +15,27 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- CHANGELOG file
- cruft template link
- open-code workflow
- graph identity provider that assigns ids to extracted data
- generalize type enums into DynamicStrEnum superclass
- seed primary source for mex on connector init
- test fixture that makes Identifiers deterministic

### Changes

- harmonized boilerplate
- assign name to uniqueness constraint
- use graph identity provider in identity endpoints
- add module name to dynamic models for better debugging
- allow 'MergedThing' as well as 'Thing' as entityType query parameter

### Deprecated

### Removed

- remove dynamic extracted model classes and use those from mex-common

### Fixed

- don't allow identifierInPrimarySource changes on node updates

### Security
9 changes: 9 additions & 0 deletions mex/backend/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
from pkgutil import extend_path

__path__ = extend_path(__path__, __name__)

from mex.backend.identity.provider import GraphIdentityProvider
from mex.backend.types import BackendIdentityProvider
from mex.common.identity.registry import register_provider

register_provider(BackendIdentityProvider.GRAPH, GraphIdentityProvider)
2 changes: 1 addition & 1 deletion mex/backend/extracted/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ def search_extracted_items(
query_results = graph.query_nodes(
q,
stableTargetId,
[t.value for t in entityType or ExtractedType],
[str(t.value) for t in entityType or ExtractedType],
skip,
limit,
)
Expand Down
53 changes: 12 additions & 41 deletions mex/backend/extracted/models.py
Original file line number Diff line number Diff line change
@@ -1,50 +1,21 @@
from enum import Enum, EnumMeta, _EnumDict
from typing import TYPE_CHECKING, Generator, Literal, Union, cast

from pydantic import Field, create_model

from mex.common.models import BASE_MODEL_CLASSES, BaseExtractedData, BaseModel
from mex.common.transform import dromedary_to_snake


def _collect_extracted_model_classes(
base_models: list[type[BaseModel]],
) -> Generator[tuple[str, type[BaseExtractedData]], None, None]:
"""Create extracted model classes with type for the given MEx models."""
for model in base_models:
# to satisfy current frontend, rename ExtractedThing -> Thing
name = model.__name__.replace("Base", "Extracted")
extracted_model = create_model(
name,
__base__=(model, BaseExtractedData),
entityType=(Literal[name], Field(name, alias="$type", frozen=True)),
)
yield name, cast(type[BaseExtractedData], extracted_model)


# mx-1533 stopgap: because we do not yet have a backend-powered identity provider,
# we need to re-create the extracted models without automatic
# identifier and stableTargetId assignment
EXTRACTED_MODEL_CLASSES_BY_NAME: dict[str, type[BaseExtractedData]] = dict(
_collect_extracted_model_classes(BASE_MODEL_CLASSES)
)

from enum import Enum
from typing import TYPE_CHECKING, Union

class ExtractedTypeMeta(EnumMeta):
"""Meta class to dynamically populate the entity type enumeration."""
from pydantic import Field

def __new__(
cls: type["ExtractedTypeMeta"], name: str, bases: tuple[type], dct: _EnumDict
) -> "ExtractedTypeMeta":
"""Create a new entity type enum by adding an entry for each model."""
for entity_type in EXTRACTED_MODEL_CLASSES_BY_NAME:
dct[dromedary_to_snake(entity_type).upper()] = entity_type
return super().__new__(cls, name, bases, dct)
from mex.backend.types import DynamicStrEnum
from mex.common.models import (
EXTRACTED_MODEL_CLASSES_BY_NAME,
BaseExtractedData,
BaseModel,
)


class ExtractedType(Enum, metaclass=ExtractedTypeMeta):
class ExtractedType(Enum, metaclass=DynamicStrEnum):
"""Enumeration of possible types for extracted items."""

__names__ = list(EXTRACTED_MODEL_CLASSES_BY_NAME)


if TYPE_CHECKING: # pragma: no cover
AnyExtractedModel = BaseExtractedData
Expand Down
2 changes: 1 addition & 1 deletion mex/backend/fields.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

from pydantic.fields import FieldInfo

from mex.backend.extracted.models import EXTRACTED_MODEL_CLASSES_BY_NAME
from mex.common.models import EXTRACTED_MODEL_CLASSES_BY_NAME
from mex.common.types import Identifier, Text


Expand Down
19 changes: 18 additions & 1 deletion mex/backend/graph/connector.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,13 @@
from mex.common.connector import BaseConnector
from mex.common.exceptions import MExError
from mex.common.logging import logger
from mex.common.models import EXTRACTED_MODEL_CLASSES_BY_NAME
from mex.common.models import (
EXTRACTED_MODEL_CLASSES_BY_NAME,
MEX_PRIMARY_SOURCE_IDENTIFIER,
MEX_PRIMARY_SOURCE_IDENTIFIER_IN_PRIMARY_SOURCE,
MEX_PRIMARY_SOURCE_STABLE_TARGET_ID,
ExtractedPrimarySource,
)
from mex.common.types import Identifier


Expand All @@ -48,6 +54,7 @@ def __init__(self) -> None:
self._check_connectivity_and_authentication()
self._seed_constraints()
self._seed_indices()
self._seed_primary_source()

def _check_connectivity_and_authentication(self) -> None:
"""Check the connectivity and authentication to the graph."""
Expand Down Expand Up @@ -87,6 +94,16 @@ def _seed_indices(self) -> GraphResult:
},
)

def _seed_primary_source(self) -> Identifier:
"""Ensure the primary source `mex` is seeded and linked to itself."""
mex_primary_source = ExtractedPrimarySource.model_construct(
hadPrimarySource=MEX_PRIMARY_SOURCE_STABLE_TARGET_ID,
identifier=MEX_PRIMARY_SOURCE_IDENTIFIER,
identifierInPrimarySource=MEX_PRIMARY_SOURCE_IDENTIFIER_IN_PRIMARY_SOURCE,
stableTargetId=MEX_PRIMARY_SOURCE_STABLE_TARGET_ID,
)
return self.ingest([mex_primary_source])[0]

def mcommit(
self, *statements_with_parameters: tuple[str, dict[str, Any] | None]
) -> list[GraphResult]:
Expand Down
2 changes: 1 addition & 1 deletion mex/backend/graph/queries.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
"""

CREATE_CONSTRAINTS_STATEMENT = r"""
CREATE CONSTRAINT IF NOT EXISTS
CREATE CONSTRAINT identifier_uniqueness IF NOT EXISTS
FOR (n:{node_label})
REQUIRE n.identifier IS UNIQUE;
"""
Expand Down
8 changes: 3 additions & 5 deletions mex/backend/graph/transform.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,12 @@

from pydantic import BaseModel as PydanticBaseModel

from mex.backend.extracted.models import (
EXTRACTED_MODEL_CLASSES_BY_NAME,
AnyExtractedModel,
)
from mex.backend.extracted.models import AnyExtractedModel
from mex.backend.fields import REFERENCE_FIELDS_BY_CLASS_NAME
from mex.backend.graph.hydrate import dehydrate, hydrate
from mex.backend.transform import to_primitive
from mex.common.identity import Identity
from mex.common.models import BaseModel, MExModel
from mex.common.models import EXTRACTED_MODEL_CLASSES_BY_NAME, BaseModel, MExModel


class MergableNode(PydanticBaseModel):
Expand All @@ -33,6 +30,7 @@ def transform_model_to_node(model: BaseModel) -> MergableNode:
on_match = on_create.copy()
on_match.pop("identifier")
on_match.pop("stableTargetId")
on_match.pop("identifierInPrimarySource")

return MergableNode(on_create=on_create, on_match=on_match)

Expand Down
42 changes: 5 additions & 37 deletions mex/backend/identity/main.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,10 @@
from fastapi import APIRouter
from fastapi.exceptions import HTTPException

from mex.backend.graph.connector import GraphConnector
from mex.backend.graph.transform import transform_identity_result_to_identity
from mex.backend.identity.models import IdentityAssignRequest, IdentityFetchResponse
from mex.backend.identity.provider import GraphIdentityProvider
from mex.common.exceptions import MExError
from mex.common.identity.models import Identity
from mex.common.models import (
MEX_PRIMARY_SOURCE_IDENTIFIER_IN_PRIMARY_SOURCE,
MEX_PRIMARY_SOURCE_STABLE_TARGET_ID,
)
from mex.common.types import Identifier

router = APIRouter()
Expand All @@ -18,35 +13,11 @@
@router.post("/identity", status_code=200, tags=["extractors"])
def assign_identity(request: IdentityAssignRequest) -> Identity:
"""Insert a new identity or update an existing one."""
connector = GraphConnector.get()
graph_result = connector.fetch_identities(
identity_provider = GraphIdentityProvider.get()
return identity_provider.assign(
had_primary_source=request.hadPrimarySource,
identifier_in_primary_source=request.identifierInPrimarySource,
)
if len(graph_result.data) > 1:
raise MExError("found multiple identities indicating graph inconsistency")
if len(graph_result.data) == 1:
return transform_identity_result_to_identity(graph_result.data[0])
if (
request.identifierInPrimarySource
== MEX_PRIMARY_SOURCE_IDENTIFIER_IN_PRIMARY_SOURCE
and request.hadPrimarySource == MEX_PRIMARY_SOURCE_STABLE_TARGET_ID
):
# This is to deal with the edge case where primary source is the parent of
# all primary sources and has no parents for itself,
# this will add itself as its parent.
return Identity(
hadPrimarySource=request.hadPrimarySource,
identifier=MEX_PRIMARY_SOURCE_STABLE_TARGET_ID,
identifierInPrimarySource=request.identifierInPrimarySource,
stableTargetId=MEX_PRIMARY_SOURCE_STABLE_TARGET_ID,
)
return Identity(
hadPrimarySource=request.hadPrimarySource,
identifier=Identifier.generate(),
identifierInPrimarySource=request.identifierInPrimarySource,
stableTargetId=Identifier.generate(),
)


@router.get("/identity", status_code=200, tags=["extractors"])
Expand All @@ -60,16 +31,13 @@ def fetch_identity(
Either provide `stableTargetId` or `hadPrimarySource`
and `identifierInPrimarySource` together to get a unique result.
"""
connector = GraphConnector.get()
identity_provider = GraphIdentityProvider.get()
try:
graph_result = connector.fetch_identities(
identities = identity_provider.fetch(
had_primary_source=hadPrimarySource,
identifier_in_primary_source=identifierInPrimarySource,
stable_target_id=stableTargetId,
)
except MExError as error:
raise HTTPException(400, error.args)
identities = [
transform_identity_result_to_identity(result) for result in graph_result.data
]
return IdentityFetchResponse(items=identities, total=len(identities))
Loading

0 comments on commit 15fc10e

Please sign in to comment.