Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove spurious validation test against scalars as not being lists... #415

Closed
wants to merge 7 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/run_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ jobs:
fail-fast: false
matrix:
os: [ ubuntu-latest ]
python-version: [ "3.7", "3.8", "3.9", "3.10" ]
python-version: [ "3.9", "3.10" ]

runs-on: ${{ matrix.os }}

Expand Down
58 changes: 3 additions & 55 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,10 +115,6 @@ It is likely that additional error conditions within KGX can be efficiently capt

The installation for KGX requires Python 3.9 or greater.


### Installation for users


#### Installing from PyPI

KGX is available on PyPI and can be installed using
Expand All @@ -134,54 +130,12 @@ To install a particular version of KGX, be sure to specify the version number,
pip install kgx==0.5.0
```


#### Installing from GitHub

Clone the GitHub repository and then install,

```bash
git clone https://github.com/biolink/kgx
cd kgx
python setup.py install
```


### Installation for developers

#### Setting up a development environment

To build directly from source, first clone the GitHub repository,

```bash
git clone https://github.com/biolink/kgx
cd kgx
```

Then install the necessary dependencies listed in ``requirements.txt``,

```bash
pip3 install -r requirements.txt
```


For convenience, make use of the `venv` module in Python3 to create a
lightweight virtual environment,

```
python3 -m venv env
source env/bin/activate

pip install -r requirements.txt
```

To install KGX you can do one of the following,

```bash
pip install .

# OR

python setup.py install
poetry install
```

### Setting up a testing environment for Neo4j
Expand All @@ -196,17 +150,11 @@ on your local machine.
Once Docker is up and running, run the following commands:

```bash
docker run -d --rm --name kgx-neo4j-integration-test \
-p 7474:7474 -p 7687:7687 \
--env NEO4J_AUTH=neo4j/test \
neo4j:4.3
docker run -d --rm --name kgx-neo4j-integration-test -p 7474:7474 -p 7687:7687 --env NEO4J_AUTH=neo4j/test neo4j:4.3
```

```bash
docker run -d --rm --name kgx-neo4j-unit-test \
-p 8484:7474 -p 8888:7687 \
--env NEO4J_AUTH=neo4j/test \
neo4j:4.3
docker run -d --rm --name kgx-neo4j-unit-test -p 8484:7474 -p 8888:7687 --env NEO4J_AUTH=neo4j/test neo4j:4.3
```


Expand Down
7 changes: 4 additions & 3 deletions docs/reference/transformer.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,9 +54,10 @@ This feature, when coupled with the `--stream` and a 'null' Transformer Sink (i

## Provenance of Nodes and Edges

Biolink Model 2.0 specified new [properties for edge provenance](https://github.com/biolink/kgx/blob/master/specification/kgx-format.md#edge-provenance) to replace the (now deprecated) `provided_by` provenance property (the `provided_by` property may still be used for node annotation).

One or more of these provenance properties may optionally be inserted as dictionary entries into the input arguments to specify default global values for these properties. Such values will be used when an edge lacks an explicit provenance property. If one does not specify such a global property, then the algorithm heuristically infers and sets a default `knowledge_source` value.
One or more of these provenance properties may optionally be inserted as dictionary entries into the input arguments to
specify default global values for these properties. Such values will be used when an edge lacks an explicit provenance
property. If one does not specify such a global property, then the algorithm heuristically infers and sets a default
`knowledge_source` value.

```python
from kgx.transformer import Transformer
Expand Down
16 changes: 15 additions & 1 deletion kgx/cli/cli_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -376,8 +376,10 @@ def _validate_files(cwd: str, file_paths: List[str], context: str = ""):


def _process_knowledge_source(ksf: str, spec: str) -> Union[str, bool, Tuple]:
print("ksf", ksf)
print("spec", spec)
if ksf not in knowledge_provenance_properties:
log.warning("Unknown Knowledge Source Field: " + ksf + "... ignoring!")
log.debug("Unknown Knowledge Source Field: " + ksf + "... ignoring!")
return False
else:
if spec.lower() == "true":
Expand All @@ -391,8 +393,13 @@ def _process_knowledge_source(ksf: str, spec: str) -> Union[str, bool, Tuple]:
# assumed to be just a default string value for the knowledge source field
return spec_parts[0]
else:
print("spec_parts", spec_parts)
print("len(spec_parts)", len(spec_parts))
print("spec_parts[:2]", spec_parts[:2])
# assumed to be an InfoRes Tuple rewrite specification
if len(spec_parts) > 3:
print("spec_parts", spec_parts)
print("spec[:2]", spec[:2])
spec_parts = spec_parts[:2]
return tuple(spec_parts)

Expand Down Expand Up @@ -541,11 +548,18 @@ def transform(
for ksf, spec in knowledge_sources:
ksf_spec = _process_knowledge_source(ksf, spec)
if isinstance(ksf_spec, tuple):
print("we've got a tuple")
print("ksf", ksf)
print("ksf_spec", ksf_spec)
print("source_dict", source_dict)
print("source_dict[input]", source_dict["input"])
if ksf not in source_dict["input"]:
source_dict["input"][ksf] = dict()
print("not in source dict", source_dict["input"][ksf])
if isinstance(source_dict["input"][ksf], dict):
key = ksf_spec[0]
source_dict["input"][ksf][key] = ksf_spec
print("in source dict now", source_dict["input"][ksf])
else:
# Unexpected condition - mixing static values with tuple specified rewrites?
raise RuntimeError(
Expand Down
1 change: 1 addition & 0 deletions kgx/source/source.py
Original file line number Diff line number Diff line change
Expand Up @@ -276,6 +276,7 @@ def set_edge_provenance(self, edge_data):
"""
Set a specific edge provenance value.
"""

self.infores_context.set_edge_provenance(edge_data)

def validate_node(self, node: Dict) -> Optional[Dict]:
Expand Down
1 change: 1 addition & 0 deletions kgx/source/tsv_source.py
Original file line number Diff line number Diff line change
Expand Up @@ -270,6 +270,7 @@ def read_edge(self, edge: Dict) -> Optional[Tuple]:
edge_data["id"] = generate_uuid()
s = edge_data["subject"]
o = edge_data["object"]
print("edge_data: ", edge_data)
self.set_edge_provenance(edge_data)
key = generate_edge_key(s, edge_data["predicate"], o)
self.edge_properties.update(list(edge_data.keys()))
Expand Down
1 change: 1 addition & 0 deletions kgx/transformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,7 @@ def transform(
filename = input_args.pop("filename", {})
for f in filename:
source = self.get_source(input_format)
print("source", source)
source.set_prefix_map(prefix_map)
if isinstance(source, RdfSource):
source.set_predicate_mapping(predicate_mappings)
Expand Down
10 changes: 9 additions & 1 deletion kgx/utils/infores.py
Original file line number Diff line number Diff line change
Expand Up @@ -154,8 +154,10 @@ def _process_infores(source: str) -> str:

if self.filter:
infores = self.filter.sub(self.substr, source)
print("filter infores", infores)
else:
infores = source

infores = self.prefix + " " + infores
infores = infores.strip()
infores = infores.lower()
Expand All @@ -165,7 +167,6 @@ def _process_infores(source: str) -> str:
infores = re.sub(r"_", "-", infores)

infores = "infores:" + infores

return infores

def parser_list(sources: Optional[List[str]] = None) -> List[str]:
Expand Down Expand Up @@ -305,6 +306,7 @@ def set_provenance_map_entry(self, ksf_value: Any) -> Any:
else: # false, ignore this source?
mapping = self.default() # source suppressed
elif isinstance(ksf_value, (list, set, tuple)):
print(ksf_value)
mapping = self.processor(infores_rewrite_filter=ksf_value)
else:
mapping = ksf_value
Expand Down Expand Up @@ -392,6 +394,7 @@ def set_provenance(self, ksf: str, data: Dict):
# If data is s a non-string iterable then, coerce into a simple list of sources
if isinstance(data[ksf], (list, set, tuple)):
sources = list(data[ksf])
print("coerced soruces", sources)
else:
# wraps knowledge sources that are multivalued in a list even if single valued
# in ingest data
Expand Down Expand Up @@ -440,6 +443,11 @@ def set_edge_provenance(self, edge_data: Dict):

"""
data_fields = list(edge_data.keys())
print("datafields", data_fields)
print("self.mapping", self.mapping)
print("knowledge_provenance_properties", knowledge_provenance_properties)

# edge provenance getting set twice, once in mapping, once in knowledge_provenance_properties
for ksf in data_fields:
if ksf in knowledge_provenance_properties:
self.set_provenance(ksf, edge_data)
Expand Down
9 changes: 8 additions & 1 deletion kgx/utils/kgx_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -785,7 +785,7 @@ def generate_edge_identifiers(graph: BaseGraph):
data["id"] = generate_uuid()


def sanitize_import(data: Dict, list_delimiter: str=None) -> Dict:
def sanitize_import(data: Dict, list_delimiter: str = None) -> Dict:
"""
Sanitize key-value pairs in dictionary.
This should be used to ensure proper syntax and types for node and edge data as it is imported.
Expand Down Expand Up @@ -846,6 +846,13 @@ def _sanitize_import_property(key: str, value: Any, list_delimiter: str) -> Any:
new_value = [x for x in value.split(list_delimiter) if x] if list_delimiter else value
else:
new_value = [str(value).replace("\n", " ").replace("\t", " ")]

# remove duplication in the list
value_set: Set = set()
for entry in new_value:
value_set.add(entry)
new_value = list(value_set)

elif column_types[key] == bool:
try:
new_value = bool(value)
Expand Down
11 changes: 1 addition & 10 deletions kgx/validator.py
Original file line number Diff line number Diff line change
Expand Up @@ -430,7 +430,7 @@ def validate_node_property_types(
element = toolkit.get_element(key)
if element:
if hasattr(element, "typeof"):
if (element.typeof == "string" and not isinstance(value, str)) or ( \
if (element.typeof == "string" and not isinstance(value, str)) or (
element.typeof == "double" and not isinstance(
value, (int, float)
)):
Expand All @@ -448,15 +448,6 @@ def validate_node_property_types(
f"Skipping validation for Node property '{key}'. "
f"Expected type '{element.typeof}' v/s Actual type '{type(value)}'"
)
if hasattr(element, "multivalued"):
if element.multivalued:
if not isinstance(value, list):
message = f"Multi-valued node property '{key}' is expected to be of type '{list}'"
self.log_error(node, error_type, message, MessageLevel.ERROR)
else:
if isinstance(value, (list, set, tuple)):
message = f"Single-valued node property '{key}' is expected to be of type '{str}'"
self.log_error(node, error_type, message, MessageLevel.ERROR)

def validate_edge_property_types(
self,
Expand Down
Loading