updated text

rapidsai · Jun 29, 2022 · b55921a · b55921a
1 parent 399c10c
commit b55921a
Showing 1 changed file with 59 additions and 9 deletions.
diff --git a/notebooks/applications/CostMatrix.ipynb b/notebooks/applications/CostMatrix.ipynb
@@ -4,26 +4,70 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# How to compute a cost matrix by replicating data\n",
+    "# How to compute a _Cost Matrix_ by replicating data\n",
     "\n",
-    "cuGraph currently does not have a All-Source Shortest Path (ASSP) algorithm.  One is on the roadmap, but that doesn't help us today.  If the graph to be processed is small, then it is possible to run ASSP by creating a lot of copies of the graph and running the Single Source Shortest Path (SSSP) on one seed per graph copy. The ASSP with edge weights summed along each path results in the cost to move between any two nodes in the graph. That is the cost matrix.\n",
+    "### Approach\n",
+    "A simple approach to creating a cost matrix is to run All-Source Shortest Path (ASSP), however cuGraph currently does not have an All-Source Shortest Path (ASSP) algorithm.  One is on the roadmap, based on Floyd-Warshall, but that doesn't help us today. Luckily there is a work around if the graph to be processed is small.  The hack is to run ASSP by creating a lot of copies of the graph and running the Single Source Shortest Path (SSSP) on one seed per graph copy. Since each SSSP run within its own disjoint component, there is no issue with path collisions between seeds.  \n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Notebook Organization\n",
+    "The first portion of the notebook discusses each step independently.  It gives insight into what is going on and how fast each step takes.\n",
     "\n",
+    "The second section puts it all the steps together in a single function and times how long with would take to compute the matrix\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Data\n",
+    "\n",
+    "In this notebook we will use the email-Eu-core\n",
+    "\n",
+    "* Number of Vertices:  1,005\n",
+    "* Number of Edges:    25,571\n",
+    "\n",
+    "We are using this dataset since it is small with a few community, meaning that there are paths to be found."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Notebook Revisions\n",
     "\n",
     "| Author Credit |    Date    |  Update          | cuGraph Version |  Test Hardware |\n",
     "| --------------|------------|------------------|-----------------|----------------|\n",
     "| Brad Rees     | 06/21/2022 | created          | 22.08           | V100 w 32 GB, CUDA 11.5\n",
     "| Don Acosta    | 06/28/2022 | modified         | 22.08           | V100 w 32 GB, CUDA 11.5"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### References\n",
+    "\n",
+    "* https://www.sciencedirect.com/topics/mathematics/cost-matrix\n",
+    "* https://en.wikipedia.org/wiki/Shortest_path_problem\n",
+    "\n",
+    "Dataset\n",
+    "* Hao Yin, Austin R. Benson, Jure Leskovec, and David F. Gleich. Local Higher-order Graph Clustering. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2017.\n",
+    "\n",
+    "* J. Leskovec, J. Kleinberg and C. Faloutsos. Graph Evolution: Densification and Shrinking Diameters. ACM Transactions on Knowledge Discovery from Data (ACM TKDD), 1(1), 2007. http://www.cs.cmu.edu/~jure/pubs/powergrowth-tkdd.pdf\n"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
     "# system and other\n",
-    "import gc\n",
-    "import os\n",
     "import time\n",
     "from time import perf_counter\n",
     "import math\n",
@@ -38,7 +82,7 @@
    "metadata": {},
    "source": [
     "-----\n",
-    "# The first section discuss each step independently.  The second section puts it all together\n",
+    "# Reading the data\n",
     "\n",
     "Let's start with data read"
    ]
@@ -76,7 +120,7 @@
    "metadata": {},
    "source": [
     "### Read the data and verify that it is zero based (e.g. first vertex is 0)\n",
-    "**IMPORTANT:** The node numbering must be zero based or else the single-source seed (offset) in each copy of the graph will not be correct and there will not be all-source coverage in the cost matrix."
+    "**IMPORTANT:** The node numbering must be zero based. We use the starting index on the replicated graph to be one larger than the number of vertices.  If the starting index is not zero, then the graph copies will overlap in index space and not be independent (disjoint). "
    ]
   },
   {
@@ -493,7 +537,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Now do it all in a single function"
+    "----\n",
+    "# Section 2: Do it all in a single function"
    ]
   },
   {
@@ -502,6 +547,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
+    "# Set the number of replications - 10 will produce 1,024 graphs\n",
     "N = 10"
    ]
   },
@@ -514,15 +560,19 @@
     "def build_cost_matrix(_gdf):\n",
     "    data = make_data(_gdf, N)\n",
     "    gdf_with_ghost, ghost_id = add_ghost_node(data, N)\n",
+    "    \n",
     "    G = cugraph.Graph(directed=True)\n",
     "    G.from_cudf_edgelist(gdf_with_ghost, source='src', destination='dst', renumber=False)\n",
+    "    \n",
     "    X = cugraph.sssp(G, ghost_id)\n",
+    "    \n",
     "    X = X[X['predecessor'] != ghost_id]\n",
     "    X = cugraph.filter_unreachable(X)\n",
     "    X['distance'] -= 1\n",
     "    X['seed'] = (X['vertex'] / offset).astype(int)\n",
     "    X['v2'] = X['vertex'] - (X['seed'] * offset)\n",
     "    cost = X.drop(columns=['vertex', 'predecessor'])\n",
+    "    \n",
     "    return cost"
    ]
   },
@@ -563,9 +613,9 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3.8.13 ('cugraph_dev')",
+   "display_name": "cugraph_dev",
    "language": "python",
-   "name": "python3"
+   "name": "cugraph_dev"
   },
   "language_info": {
    "codemirror_mode": {