Skip to content

Commit

Permalink
Merge pull request #267 from Mec-iS/issue-261-graph-algebra
Browse files Browse the repository at this point in the history
Starting graph algebra
  • Loading branch information
Mec-iS authored Sep 6, 2022
2 parents 3814e8a + 3687dfc commit 1c78a6a
Show file tree
Hide file tree
Showing 8 changed files with 533 additions and 24 deletions.
199 changes: 199 additions & 0 deletions examples/graph_algebra/gla_ex0_0.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,199 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Graph Algebra with `kglab`\n",
"\n",
"## intro\n",
"`kglab` provides tools to access graph data from multiple source to build a `KnowledgeGraph` that can be easily used by data scientists. For a thorough explanation of how to use triples-stored data and how to load this data into `kglab` please see examples in the `examples/` directory. The examples in this directory (`examples/graph_algebra/`) will care to introduce graph algebra capabilities to be used on the graphs the user has loaded. \n",
"\n",
"## basic load and querying\n",
"In particular, once your data is loaded in a `KnowledgeGraph` with something like:\n",
"\n",
"1. Instantiate a graph from a dataset:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<kglab.kglab.KnowledgeGraph at 0x7f283f3d3940>"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# for use in tutorial and development; do not include this `sys.path` change in production:\n",
"import sys ; sys.path.insert(0, \"../../\")\n",
"from os.path import dirname\n",
"import kglab\n",
"import os\n",
"\n",
"namespaces = {\n",
" \"foaf\": \"http://xmlns.com/foaf/0.1/\",\n",
" \"gorm\": \"http://example.org/sagas#\",\n",
" \"rel\": \"http://purl.org/vocab/relationship/\",\n",
" }\n",
"\n",
"kg = kglab.KnowledgeGraph(\n",
" name = \"Happy Vikings KG example for SKOS/OWL inference\",\n",
" namespaces=namespaces,\n",
" )\n",
"\n",
"kg.load_rdf(dirname(dirname(os.getcwd())) + \"/dat/gorm.ttl\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"2. It is possible to create a subgraph by providing a SPARQL query, by defining a \"subject\" and \"object\":\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"query = \"\"\"SELECT ?subject ?object\n",
"WHERE {\n",
" ?subject rdf:type gorm:Viking .\n",
" ?subject gorm:childOf ?object .\n",
"}\n",
"\"\"\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## define a subgraph\n",
"In this case we are looking for the network of parent-child relations among members of Vikings family.\n",
"\n",
"With this query we can define a **subgraph** so to have access to **graph algebra** capabilities: "
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"from kglab.subg import SubgraphMatrix\n",
"\n",
"subgraph = SubgraphMatrix(kg=kg, sparql=query)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## compute Adjacency matrices\n",
"Let's compute the first basic adjacency matrix (usually noted with `A`):"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[0., 1., 1., 0., 0.],\n",
" [0., 0., 0., 1., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 1.],\n",
" [0., 0., 0., 0., 0.]])"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"adj_matrix = subgraph.to_adjacency()\n",
"adj_matrix"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"what happened here is that all the subjects and objects have been turned into integer indices from 0 to number of nodes. So we can see that the entity with index 0 is adjancent (is connected, has a directed edge) to the entity with index 1. This is a directed graph because the relationship `gorm:childOf` goes from child to parent, let's turn this into an undirected graph so to see the relation in a more symmetric way (both the child-parent and parent-child)."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[0., 1., 1., 0., 0.],\n",
" [1., 0., 0., 1., 0.],\n",
" [1., 0., 0., 0., 0.],\n",
" [0., 1., 0., 0., 1.],\n",
" [0., 0., 0., 1., 0.]])"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"undirected_adj_mtx = subgraph.to_undirected()\n",
"undirected_adj_mtx"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see now the relationship is a generic symmetric \"parenthood\" relations, not just a child-parent directed relationship."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.8.10 64-bit ('.venv': venv)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "de68f9b565e1e230f4433adb1a318d8f3a0dfad0917fa0c696727472c8ddadbf"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}
65 changes: 65 additions & 0 deletions kglab/algebra.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
"""
Working with `SubgraphMatrix` as vectorized representation.
Additions to functionalities present in `subg.py`.
Integrate `scipy` and `scikit-learn` functionalities.
see license https://github.com/DerwenAI/kglab#license-and-copyright
"""
import typing

import networkx as nx
from networkx import DiGraph

class AlgebraMixin:
"""
Provides methods to work with graph algebra using `SubgraphMatrix` data.
NOTE: provide optional Oxigraph support for fast in-memory computation
"""
nx_graph: typing.Optional[DiGraph] = None

def to_undirected(self):
return nx.to_numpy_array(self.nx_graph.to_undirected())

def to_adjacency(self):
"""
Return adjacency (dense) matrix for the KG.
[Relevant NetworkX interface](https://networkx.org/documentation/stable/reference/convert.html#id2)
returns:
`numpy.array`: the array representation in `numpy` standard
"""
self.check_attributes()
return nx.to_numpy_array(self.nx_graph)

def to_incidence(self):
"""
Return incidence (dense) matrix for the KG.
[Relevant scipy docs](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html)
returns:
`numpy.array`: the array representation in `numpy` standard
"""
self.check_attributes()
return nx.incidence_matrix(self.nx_graph).toarray()

def to_laplacian(self):
"""
Return Laplacian matrix for the KG. Graph is turned into undirected.
[docs](https://networkx.org/documentation/stable/reference/generated/networkx.linalg.laplacianmatrix.laplacian_matrix.html)
returns:
`numpy.array`: the array representation in `numpy` standard
"""
self.check_attributes()
return nx.laplacian_matrix(self.nx_graph.to_undirected()).toarray()

def to_scipy_sparse(self):
"""
Return graph in CSR format (optimized for matrix-matrix operations).
returns:
SciPy sparse matrix: Graph adjacency matrix.
"""
self.check_attributes()
return nx.to_scipy_sparse_array(self.nx_graph)
43 changes: 43 additions & 0 deletions kglab/networks.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
"""
Working with `SubgraphMatrix` as vectorized representation.
Additions to functionalities present in `subg.py`.
Integrate `scikit-network` functionalities.
see license https://github.com/DerwenAI/kglab#license-and-copyright
"""

import sknetwork as skn

class NetAnalysisMixin:
"""
Provides methods for network analysis tools to work with `KnowledgeGraph`.
"""
def get_distances(self, adj_mtx):
"""
Compute distances according to an adjacency matrix.
"""
self.check_attributes()
return skn.path.get_distances(adj_mtx)

def get_shortest_path(self, adj_matx, src, dst):
"""
Return shortest path from sources to destinations according to an djacency matrix.
adj_mtx:
numpy.array: adjacency matrix for the graph.
src:
int or iterable: indices of source nodes
dst:
int or iterable: indices of destination nodes
returns:
list of int: a path of indices
"""
self.check_attributes()
return skn.path.get_shortest_path(adj_matx, src, dst)


# number of nodes, number of edges
# density
# triangles
# reciprocity
10 changes: 7 additions & 3 deletions kglab/query/mixin.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,8 @@ def query_as_df (
pythonify: bool = True,
) -> pd.DataFrame:
"""
Wrapper for [`rdflib.Graph.query()`](https://rdflib.readthedocs.io/en/stable/apidocs/rdflib.html?highlight=query#rdflib.Graph.query) to perform a SPARQL query on the RDF graph.
Wrapper for [`rdflib.Graph.query()`](https://rdflib.readthedocs.io/en/stable/apidocs/rdflib.html?highlight=query#rdflib.Graph.query)
to perform a SPARQL query on the RDF graph.
sparql:
text for the SPARQL query
Expand Down Expand Up @@ -123,7 +124,8 @@ def visualize_query (
notebook: bool = False,
) -> pyvis.network.Network:
"""
Visualize the given SPARQL query as a [`pyvis.network.Network`](https://pyvis.readthedocs.io/en/latest/documentation.html#pyvis.network.Network)
Visualize the given SPARQL query as a
[`pyvis.network.Network`](https://pyvis.readthedocs.io/en/latest/documentation.html#pyvis.network.Network)
sparql:
input SPARQL query to be visualized
Expand All @@ -144,7 +146,9 @@ def n3fy (
pythonify: bool = True,
) -> typing.Any:
"""
Wrapper for RDFlib [`n3()`](https://rdflib.readthedocs.io/en/stable/utilities.html?highlight=n3#serializing-a-single-term-to-n3) and [`toPython()`](https://rdflib.readthedocs.io/en/stable/apidocs/rdflib.html?highlight=toPython#rdflib.Variable.toPython) to serialize a node into a human-readable representation using N3 format.
Wrapper for RDFlib [`n3()`](https://rdflib.readthedocs.io/en/stable/utilities.html?highlight=n3#serializing-a-single-term-to-n3)
and [`toPython()`](https://rdflib.readthedocs.io/en/stable/apidocs/rdflib.html?highlight=toPython#rdflib.Variable.toPython)
to serialize a node into a human-readable representation using N3 format.
node:
must be a [`rdflib.term.Node`](https://rdflib.readthedocs.io/en/stable/apidocs/rdflib.html?highlight=Node#rdflib.term.Node)
Expand Down
Loading

0 comments on commit 1c78a6a

Please sign in to comment.