-
-
Notifications
You must be signed in to change notification settings - Fork 67
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #267 from Mec-iS/issue-261-graph-algebra
Starting graph algebra
- Loading branch information
Showing
8 changed files
with
533 additions
and
24 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,199 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Graph Algebra with `kglab`\n", | ||
"\n", | ||
"## intro\n", | ||
"`kglab` provides tools to access graph data from multiple source to build a `KnowledgeGraph` that can be easily used by data scientists. For a thorough explanation of how to use triples-stored data and how to load this data into `kglab` please see examples in the `examples/` directory. The examples in this directory (`examples/graph_algebra/`) will care to introduce graph algebra capabilities to be used on the graphs the user has loaded. \n", | ||
"\n", | ||
"## basic load and querying\n", | ||
"In particular, once your data is loaded in a `KnowledgeGraph` with something like:\n", | ||
"\n", | ||
"1. Instantiate a graph from a dataset:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 1, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"<kglab.kglab.KnowledgeGraph at 0x7f283f3d3940>" | ||
] | ||
}, | ||
"execution_count": 1, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"# for use in tutorial and development; do not include this `sys.path` change in production:\n", | ||
"import sys ; sys.path.insert(0, \"../../\")\n", | ||
"from os.path import dirname\n", | ||
"import kglab\n", | ||
"import os\n", | ||
"\n", | ||
"namespaces = {\n", | ||
" \"foaf\": \"http://xmlns.com/foaf/0.1/\",\n", | ||
" \"gorm\": \"http://example.org/sagas#\",\n", | ||
" \"rel\": \"http://purl.org/vocab/relationship/\",\n", | ||
" }\n", | ||
"\n", | ||
"kg = kglab.KnowledgeGraph(\n", | ||
" name = \"Happy Vikings KG example for SKOS/OWL inference\",\n", | ||
" namespaces=namespaces,\n", | ||
" )\n", | ||
"\n", | ||
"kg.load_rdf(dirname(dirname(os.getcwd())) + \"/dat/gorm.ttl\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"\n", | ||
"2. It is possible to create a subgraph by providing a SPARQL query, by defining a \"subject\" and \"object\":\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 2, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"query = \"\"\"SELECT ?subject ?object\n", | ||
"WHERE {\n", | ||
" ?subject rdf:type gorm:Viking .\n", | ||
" ?subject gorm:childOf ?object .\n", | ||
"}\n", | ||
"\"\"\"" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"\n", | ||
"## define a subgraph\n", | ||
"In this case we are looking for the network of parent-child relations among members of Vikings family.\n", | ||
"\n", | ||
"With this query we can define a **subgraph** so to have access to **graph algebra** capabilities: " | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 3, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from kglab.subg import SubgraphMatrix\n", | ||
"\n", | ||
"subgraph = SubgraphMatrix(kg=kg, sparql=query)\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## compute Adjacency matrices\n", | ||
"Let's compute the first basic adjacency matrix (usually noted with `A`):" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 4, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"array([[0., 1., 1., 0., 0.],\n", | ||
" [0., 0., 0., 1., 0.],\n", | ||
" [0., 0., 0., 0., 0.],\n", | ||
" [0., 0., 0., 0., 1.],\n", | ||
" [0., 0., 0., 0., 0.]])" | ||
] | ||
}, | ||
"execution_count": 4, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"adj_matrix = subgraph.to_adjacency()\n", | ||
"adj_matrix" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"what happened here is that all the subjects and objects have been turned into integer indices from 0 to number of nodes. So we can see that the entity with index 0 is adjancent (is connected, has a directed edge) to the entity with index 1. This is a directed graph because the relationship `gorm:childOf` goes from child to parent, let's turn this into an undirected graph so to see the relation in a more symmetric way (both the child-parent and parent-child)." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 6, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"array([[0., 1., 1., 0., 0.],\n", | ||
" [1., 0., 0., 1., 0.],\n", | ||
" [1., 0., 0., 0., 0.],\n", | ||
" [0., 1., 0., 0., 1.],\n", | ||
" [0., 0., 0., 1., 0.]])" | ||
] | ||
}, | ||
"execution_count": 6, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"undirected_adj_mtx = subgraph.to_undirected()\n", | ||
"undirected_adj_mtx" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"We can see now the relationship is a generic symmetric \"parenthood\" relations, not just a child-parent directed relationship." | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3.8.10 64-bit ('.venv': venv)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.8.10" | ||
}, | ||
"orig_nbformat": 4, | ||
"vscode": { | ||
"interpreter": { | ||
"hash": "de68f9b565e1e230f4433adb1a318d8f3a0dfad0917fa0c696727472c8ddadbf" | ||
} | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
""" | ||
Working with `SubgraphMatrix` as vectorized representation. | ||
Additions to functionalities present in `subg.py`. | ||
Integrate `scipy` and `scikit-learn` functionalities. | ||
see license https://github.com/DerwenAI/kglab#license-and-copyright | ||
""" | ||
import typing | ||
|
||
import networkx as nx | ||
from networkx import DiGraph | ||
|
||
class AlgebraMixin: | ||
""" | ||
Provides methods to work with graph algebra using `SubgraphMatrix` data. | ||
NOTE: provide optional Oxigraph support for fast in-memory computation | ||
""" | ||
nx_graph: typing.Optional[DiGraph] = None | ||
|
||
def to_undirected(self): | ||
return nx.to_numpy_array(self.nx_graph.to_undirected()) | ||
|
||
def to_adjacency(self): | ||
""" | ||
Return adjacency (dense) matrix for the KG. | ||
[Relevant NetworkX interface](https://networkx.org/documentation/stable/reference/convert.html#id2) | ||
returns: | ||
`numpy.array`: the array representation in `numpy` standard | ||
""" | ||
self.check_attributes() | ||
return nx.to_numpy_array(self.nx_graph) | ||
|
||
def to_incidence(self): | ||
""" | ||
Return incidence (dense) matrix for the KG. | ||
[Relevant scipy docs](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html) | ||
returns: | ||
`numpy.array`: the array representation in `numpy` standard | ||
""" | ||
self.check_attributes() | ||
return nx.incidence_matrix(self.nx_graph).toarray() | ||
|
||
def to_laplacian(self): | ||
""" | ||
Return Laplacian matrix for the KG. Graph is turned into undirected. | ||
[docs](https://networkx.org/documentation/stable/reference/generated/networkx.linalg.laplacianmatrix.laplacian_matrix.html) | ||
returns: | ||
`numpy.array`: the array representation in `numpy` standard | ||
""" | ||
self.check_attributes() | ||
return nx.laplacian_matrix(self.nx_graph.to_undirected()).toarray() | ||
|
||
def to_scipy_sparse(self): | ||
""" | ||
Return graph in CSR format (optimized for matrix-matrix operations). | ||
returns: | ||
SciPy sparse matrix: Graph adjacency matrix. | ||
""" | ||
self.check_attributes() | ||
return nx.to_scipy_sparse_array(self.nx_graph) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
""" | ||
Working with `SubgraphMatrix` as vectorized representation. | ||
Additions to functionalities present in `subg.py`. | ||
Integrate `scikit-network` functionalities. | ||
see license https://github.com/DerwenAI/kglab#license-and-copyright | ||
""" | ||
|
||
import sknetwork as skn | ||
|
||
class NetAnalysisMixin: | ||
""" | ||
Provides methods for network analysis tools to work with `KnowledgeGraph`. | ||
""" | ||
def get_distances(self, adj_mtx): | ||
""" | ||
Compute distances according to an adjacency matrix. | ||
""" | ||
self.check_attributes() | ||
return skn.path.get_distances(adj_mtx) | ||
|
||
def get_shortest_path(self, adj_matx, src, dst): | ||
""" | ||
Return shortest path from sources to destinations according to an djacency matrix. | ||
adj_mtx: | ||
numpy.array: adjacency matrix for the graph. | ||
src: | ||
int or iterable: indices of source nodes | ||
dst: | ||
int or iterable: indices of destination nodes | ||
returns: | ||
list of int: a path of indices | ||
""" | ||
self.check_attributes() | ||
return skn.path.get_shortest_path(adj_matx, src, dst) | ||
|
||
|
||
# number of nodes, number of edges | ||
# density | ||
# triangles | ||
# reciprocity |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.