Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature/mx-1381 rework database model #25

Merged
merged 69 commits into from
Apr 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
2a4badc
Some clean up
cutoffthetop Jan 16, 2024
d62dedd
Implement graph id provider
cutoffthetop Jan 16, 2024
59e584e
Remove docs
cutoffthetop Jan 16, 2024
0bcac39
Merge branch 'main' of https://github.com/robert-koch-institut/mex-ba…
cutoffthetop Jan 18, 2024
5af07fa
Clean up enum, add tests, bump versions
cutoffthetop Jan 19, 2024
d4b567e
Update expectation
cutoffthetop Jan 19, 2024
5aab81a
Changelog
cutoffthetop Jan 19, 2024
514d81e
Changelog
cutoffthetop Jan 19, 2024
a341a83
Poetry update
cutoffthetop Jan 19, 2024
feed73c
Stop inline nested and model as nodes instead
cutoffthetop Jan 26, 2024
3deea4c
WIP
cutoffthetop Feb 7, 2024
6678e82
Merge branch 'main' of https://github.com/robert-koch-institut/mex-ba…
cutoffthetop Feb 7, 2024
ff652d1
Cruft update
cutoffthetop Feb 7, 2024
b35cf1d
Fix tests
cutoffthetop Feb 7, 2024
f9751a9
Merge branch 'feature/mx-1533-graph-id-provider' into feature/mx-1381…
cutoffthetop Feb 8, 2024
e27bddc
Fixing tests
cutoffthetop Feb 8, 2024
e73fec2
Rewrite using jinja
cutoffthetop Feb 14, 2024
61d0902
Create field lists nicer
cutoffthetop Feb 14, 2024
798524c
Add edge pruning
cutoffthetop Feb 14, 2024
55a2b48
Elevate query testing
cutoffthetop Feb 15, 2024
14165f0
Update tests
cutoffthetop Feb 15, 2024
f9376f2
Polishing and version bumps
cutoffthetop Feb 15, 2024
d4de1d1
Merge branch 'main' into feature/mx-1381-prep-rule-endpoint
cutoffthetop Feb 19, 2024
3c6809b
Update lock
cutoffthetop Feb 19, 2024
c8250c3
Set id provider for integration testing
cutoffthetop Feb 19, 2024
9fee560
Fix connector test
cutoffthetop Feb 19, 2024
1ae565b
Add query readme and docs
cutoffthetop Feb 19, 2024
dce0172
Add example to arg
cutoffthetop Feb 20, 2024
4452c6e
Fix tests and update common
cutoffthetop Feb 21, 2024
f53e4be
Merge branch 'main' of https://github.com/robert-koch-institut/mex-ba…
cutoffthetop Feb 21, 2024
be9bfc8
Rename query_nodes to fetch_extracted_data
cutoffthetop Feb 21, 2024
bd11479
No need to stringify identifier
cutoffthetop Feb 21, 2024
e34f35e
Simplify merge node gc query
cutoffthetop Feb 21, 2024
83702b2
Simplify query builder teardown
cutoffthetop Feb 21, 2024
c2f2116
Update cruft and fix linting
cutoffthetop Feb 21, 2024
b7f556c
Merge branch 'main' of https://github.com/robert-koch-institut/mex-ba…
cutoffthetop Feb 22, 2024
0544134
Add random pytest order
cutoffthetop Feb 22, 2024
db73b46
Remove id-provider env var
cutoffthetop Feb 22, 2024
26a4375
Rename Result.update_counters and add tests
cutoffthetop Feb 22, 2024
6167888
Simplify refs match clause
cutoffthetop Feb 22, 2024
7ead1fc
More speaking query variables
cutoffthetop Feb 22, 2024
038c69d
Add doc to merge_node query
cutoffthetop Feb 22, 2024
42a94bb
Re-create index if there were changes to searchable classes and fields
cutoffthetop Feb 22, 2024
05bb00c
Update docs, make merge_node/edges private
cutoffthetop Feb 22, 2024
c316465
Merge branch 'main' of https://github.com/robert-koch-institut/mex-ba…
cutoffthetop Feb 27, 2024
e0530d8
Update cruft and deps
cutoffthetop Feb 27, 2024
fd65b57
Merge branch 'main' of https://github.com/robert-koch-institut/mex-ba…
cutoffthetop Mar 5, 2024
d37df78
Update cruft 12165319453990fdbe02bce39a3236337e298bc0
cutoffthetop Mar 5, 2024
d52f08d
Merge branch 'main' of https://github.com/robert-koch-institut/mex-ba…
cutoffthetop Mar 15, 2024
7fbac89
Merge branch 'main' of https://github.com/robert-koch-institut/mex-ba…
cutoffthetop Mar 15, 2024
489c242
Reduce diff
cutoffthetop Mar 15, 2024
71d44c2
Update docstring
cutoffthetop Mar 27, 2024
0c531b6
Update versions
cutoffthetop Mar 27, 2024
a4b2e36
Update uvicorn and neo4j
cutoffthetop Mar 27, 2024
3fe1882
Merge branch 'main' of https://github.com/robert-koch-institut/mex-ba…
cutoffthetop Mar 27, 2024
0b957e8
Use speaking names for gc
cutoffthetop Mar 28, 2024
e52136b
Add APOC example
cutoffthetop Apr 2, 2024
fd09bc3
Remove redundant label filter
cutoffthetop Apr 2, 2024
dbb1e51
Add annotated test case
cutoffthetop Apr 2, 2024
4a2f5ef
Remove local import
cutoffthetop Apr 2, 2024
c7e6b08
Merge branch 'main' of https://github.com/robert-koch-institut/mex-ba…
cutoffthetop Apr 2, 2024
c004b98
Update changelog
cutoffthetop Apr 2, 2024
98cce8d
update versions
cutoffthetop Apr 2, 2024
32191a4
Rename to _contains_only_types
cutoffthetop Apr 2, 2024
febd447
Fix docstring
cutoffthetop Apr 3, 2024
3b83109
Expand docstrings
cutoffthetop Apr 3, 2024
871fba3
Ensure lifespan is called
cutoffthetop Apr 3, 2024
3fb35d5
Fix test isolation and close coverage gaps
cutoffthetop Apr 3, 2024
0e4daea
Merge branch 'main' into feature/mx-1381-prep-rule-endpoint
cutoffthetop Apr 5, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@ default_language_version:
python: python3.11
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.3.2
rev: v0.3.5
hooks:
- id: ruff
args: [--fix, --exit-non-zero-on-fix]
- repo: https://github.com/psf/black
rev: 24.2.0
rev: 24.3.0
hooks:
- id: black
- repo: https://github.com/pre-commit/pre-commit-hooks
Expand Down
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,17 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Changes

- re-implemented queries as templated cql files
- updated graph connector for new queries
- improved isolation of neo4j dependency
- improved documentation and code-readability

### Deprecated

### Removed

- trashed hydration module

### Fixed

### Security
Expand Down
14 changes: 6 additions & 8 deletions mex/backend/extracted/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,9 @@

from fastapi import APIRouter, Query

from mex.backend.extracted.models import ExtractedItemSearchResponse, ExtractedType
from mex.backend.extracted.transform import (
transform_graph_results_to_extracted_item_search_response,
)
from mex.backend.extracted.models import ExtractedItemSearchResponse
from mex.backend.graph.connector import GraphConnector
from mex.backend.types import ExtractedType
from mex.common.types import Identifier

router = APIRouter()
Expand All @@ -15,20 +13,20 @@
@router.get("/extracted-item", tags=["editor"])
def search_extracted_items(
q: Annotated[str, Query(max_length=100)] = "",
stableTargetId: Identifier | None = None, # noqa: N803
entityType: Annotated[ # noqa: N803
stableTargetId: Identifier | None = None,
entityType: Annotated[
Sequence[ExtractedType], Query(max_length=len(ExtractedType))
] = [],
skip: Annotated[int, Query(ge=0, le=10e10)] = 0,
limit: Annotated[int, Query(ge=1, le=100)] = 10,
) -> ExtractedItemSearchResponse:
"""Search for extracted items by query text or by type and id."""
graph = GraphConnector.get()
query_results = graph.query_nodes(
result = graph.fetch_extracted_data(
q,
stableTargetId,
[str(t.value) for t in entityType or ExtractedType],
skip,
limit,
)
return transform_graph_results_to_extracted_item_search_response(query_results)
return ExtractedItemSearchResponse.model_validate(result.one())
24 changes: 3 additions & 21 deletions mex/backend/extracted/models.py
Original file line number Diff line number Diff line change
@@ -1,30 +1,12 @@
from enum import Enum
from typing import TYPE_CHECKING, Union
from typing import Annotated

from pydantic import Field

from mex.backend.types import DynamicStrEnum
from mex.common.models import (
EXTRACTED_MODEL_CLASSES_BY_NAME,
BaseExtractedData,
BaseModel,
)


class ExtractedType(Enum, metaclass=DynamicStrEnum):
"""Enumeration of possible types for extracted items."""

__names__ = list(EXTRACTED_MODEL_CLASSES_BY_NAME)


if TYPE_CHECKING: # pragma: no cover
AnyExtractedModel = BaseExtractedData
else:
AnyExtractedModel = Union[*EXTRACTED_MODEL_CLASSES_BY_NAME.values()]
from mex.common.models import AnyExtractedModel, BaseModel


class ExtractedItemSearchResponse(BaseModel):
"""Response body for the extracted item search endpoint."""

total: int
items: list[AnyExtractedModel] = Field(discriminator="entityType")
items: Annotated[list[AnyExtractedModel], Field(discriminator="entityType")]
40 changes: 0 additions & 40 deletions mex/backend/extracted/transform.py

This file was deleted.

172 changes: 138 additions & 34 deletions mex/backend/fields.py
Original file line number Diff line number Diff line change
@@ -1,60 +1,164 @@
from types import UnionType
from typing import Annotated, Any, Generator, Union, get_args, get_origin
from types import NoneType, UnionType
from typing import (
Annotated,
Any,
Callable,
Generator,
Mapping,
Union,
get_args,
get_origin,
)

from pydantic import BaseModel
from pydantic.fields import FieldInfo

from mex.backend.types import LiteralStringType
from mex.common.models import EXTRACTED_MODEL_CLASSES_BY_NAME
from mex.common.types import Identifier, Text
from mex.common.types import MERGED_IDENTIFIER_CLASSES, Link, Text


def _get_inner_types(annotation: Any) -> Generator[type, None, None]:
"""Yield all inner types from Unions, lists and annotations."""
"""Yield all inner types from unions, lists and type annotations (except NoneType).

Args:
annotation: A valid python type annotation

Returns:
A generator for all (non-NoneType) types found in the annotation
"""
if get_origin(annotation) == Annotated:
yield from _get_inner_types(get_args(annotation)[0])
elif get_origin(annotation) in (Union, UnionType, list):
for arg in get_args(annotation):
yield from _get_inner_types(arg)
elif annotation is None:
yield type(None)
else:
elif annotation not in (None, NoneType):
yield annotation


def is_reference_field(field: FieldInfo) -> bool:
"""Return whether the given field contains a stable target id."""
return any(
isinstance(t, type) and issubclass(t, Identifier)
for t in _get_inner_types(field.annotation)
)
def _contains_only_types(field: FieldInfo, *types: type) -> bool:
"""Return whether a `field` is annotated as one of the given `types`.

Unions, lists and type annotations are checked for their inner types and only the
non-`NoneType` types are considered for the type-check.

def is_text_field(field: FieldInfo) -> bool:
"""Return whether the given field is holding text objects."""
return any(
isinstance(t, type) and issubclass(t, Text)
for t in _get_inner_types(field.annotation)
)
Args:
field: A pydantic `FieldInfo` object
types: Types to look for in the field's annotation

Returns:
Whether the field contains any of the given types
"""
if inner_types := list(_get_inner_types(field.annotation)):
return all(inner_type in types for inner_type in inner_types)
return False

REFERENCE_FIELDS_BY_CLASS_NAME = {
name: {
field_name
for field_name, field_info in cls.model_fields.items()
if field_name
not in (
"identifier",
"stableTargetId",

def _group_fields_by_class_name(
model_classes_by_name: Mapping[str, type[BaseModel]],
predicate: Callable[[FieldInfo], bool],
) -> dict[str, list[str]]:
"""Group the field names by model class and filter them by the given predicate.

Args:
model_classes_by_name: Map from class names to model classes
predicate: Function to filter the fields of the classes by

Returns:
Dictionary mapping class names to a list of field names filtered by `predicate`
"""
return {
name: sorted(
{
field_name
for field_name, field_info in cls.model_fields.items()
if predicate(field_info)
}
)
and is_reference_field(field_info)
for name, cls in model_classes_by_name.items()
}


# fields that are immutable and can only be set once
FROZEN_FIELDS_BY_CLASS_NAME = _group_fields_by_class_name(
EXTRACTED_MODEL_CLASSES_BY_NAME, lambda field_info: field_info.frozen is True
)

# static fields that are set once on class-level to a literal type
LITERAL_FIELDS_BY_CLASS_NAME = _group_fields_by_class_name(
EXTRACTED_MODEL_CLASSES_BY_NAME,
lambda field_info: isinstance(field_info.annotation, LiteralStringType),
)

# fields typed as merged identifiers containing references to merged items
REFERENCE_FIELDS_BY_CLASS_NAME = _group_fields_by_class_name(
EXTRACTED_MODEL_CLASSES_BY_NAME,
lambda field_info: _contains_only_types(field_info, *MERGED_IDENTIFIER_CLASSES),
)

# nested fields that contain `Text` objects
TEXT_FIELDS_BY_CLASS_NAME = _group_fields_by_class_name(
EXTRACTED_MODEL_CLASSES_BY_NAME,
lambda field_info: _contains_only_types(field_info, Text),
)

# nested fields that contain `Link` objects
LINK_FIELDS_BY_CLASS_NAME = _group_fields_by_class_name(
EXTRACTED_MODEL_CLASSES_BY_NAME,
lambda field_info: _contains_only_types(field_info, Link),
)

# fields annotated as `str` type
STRING_FIELDS_BY_CLASS_NAME = _group_fields_by_class_name(
EXTRACTED_MODEL_CLASSES_BY_NAME,
lambda field_info: _contains_only_types(field_info, str),
)

# fields that should be indexed as searchable fields
SEARCHABLE_FIELDS = sorted(
{
field_name
for field_names in STRING_FIELDS_BY_CLASS_NAME.values()
for field_name in field_names
}
)

# classes that have fields that should be searchable
SEARCHABLE_CLASSES = sorted(
{name for name, field_names in STRING_FIELDS_BY_CLASS_NAME.items() if field_names}
)

# fields with changeable values that are not nested objects or merged item references
MUTABLE_FIELDS_BY_CLASS_NAME = {
name: sorted(
{
field_name
for field_name in cls.model_fields
if field_name
not in (
*FROZEN_FIELDS_BY_CLASS_NAME[name],
*REFERENCE_FIELDS_BY_CLASS_NAME[name],
*TEXT_FIELDS_BY_CLASS_NAME[name],
*LINK_FIELDS_BY_CLASS_NAME[name],
)
}
)
for name, cls in EXTRACTED_MODEL_CLASSES_BY_NAME.items()
}

TEXT_FIELDS_BY_CLASS_NAME = {
name: {
f"{field_name}_value"
for field_name, field_info in cls.model_fields.items()
if is_text_field(field_info)
}
# fields with values that should be set once but are neither literal nor references
FINAL_FIELDS_BY_CLASS_NAME = {
name: sorted(
{
field_name
for field_name in cls.model_fields
if field_name in FROZEN_FIELDS_BY_CLASS_NAME[name]
and field_name
not in (
*LITERAL_FIELDS_BY_CLASS_NAME[name],
*REFERENCE_FIELDS_BY_CLASS_NAME[name],
)
}
)
for name, cls in EXTRACTED_MODEL_CLASSES_BY_NAME.items()
}
Loading
Loading