Skip to content

Commit

Permalink
first commit
Browse files Browse the repository at this point in the history
  • Loading branch information
NicoRenaud committed Jun 12, 2018
0 parents commit 927513d
Show file tree
Hide file tree
Showing 15 changed files with 1,746 additions and 0 deletions.
219 changes: 219 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,219 @@
# graphRank

To do anything with the code first go in the python directory
```
cd python
```

# Usage
You can check all the options of the code using

```
python graphRank.py --help
```

```
usage: graphRank.py [-h] [--testID TESTID] [--trainID TRAINID] [--graph GRAPH]
[--check CHECK] [--outfile OUTFILE] [--tune_kernel]
[--test] [--lamb LAMB] [--walk WALK] [--func FUNC]
[--cuda] [--gpu_block GPU_BLOCK [GPU_BLOCK ...]]
test graphRank
optional arguments:
-h, --help show this help message and exit
--testID TESTID list of ID for testing. Default: testID.lst
--trainID TRAINID list of ID for training. Default: trainID.lst
--graph GRAPH folder containing the graph of each complex. Default:
graphMAT
--check CHECK file containing the kernel. Default:
kernelMAT/<testID_name>.mat
--outfile OUTFILE Output file containing the calculated Kernel values.
Default: kernel.pkl
--tune_kernel Only tune the CUDA kernel
--test Only test the functions on a single pair pair of graph
--lamb LAMB Lambda parameter in the Kernel calculations. Default:
1
--walk WALK Max walk length in the Kernel calculations. Default: 4
--method METHOD Method used in the calculation: 'vect'(default),
'combvec', 'iter'
--func FUNC functions to tune in the kernel. Defaut: all functions
--cuda Use CUDA kernel
--gpu_block GPU_BLOCK [GPU_BLOCK ...]
number of gpu block to use. Default: 8 8 1
```

# Test
I've build the code as a command line tool. So before testing/using the code it must be made available in your path. You can for example create an alias in your .bashrc

```
alias graphRank=/path/to/the/library/graphRank.py
```

You can add the file to your bin or add the folder to your path. To test the code first go to the test folder

```
cd test_code2
```

As explained above the default values for the trainIDs and testIDs are 'testIDs.lst' and 'trainIDs.lst'. So you don't need to specify them if you keep the same file names as you currently do. Similarly the individual graphs are expected in './graphMAT' and the matlab computed kernels are expected in './kernelMAT/<test_ID>.mat'. So you don't need to specify them either as long as you keep the folder names the same. Therefore you can test the CPU/GPU version of the code with:

#### CPU version
```
graphRank --test
```

#### GPU version
```
graphRank --test --cuda
```

which should output (GPU version)

```
--------------------
- timing
--------------------
GPU - Kern : 0.111562
GPU - Mem : 0.190918 (block size:8x8)
GPU - Kron : 0.081629 (block size:8x8)
GPU - Px : 0.002048 (block size:8x8)
GPU - W0 : 0.001714 (block size:8x8)
CPU - K : 0.024109
--------------------
- Accuracy
--------------------
K : 1.57e-05 4.61e-05 0.000175 0.000491 0.00192
Kcheck : 1.57e-05 4.61e-05 0.000175 0.000491 0.00192
```

The timing part output the execution time for the main steps of the calculation.

* GPU - Kern : time needed to compile the cuda kernel
* GPU - Mem : time needed to book the memeory on the GPU

These two steps are needed only once when calculating the kernels of several pairs.

* GPU - Kron : time needed to compute the kronecker matrix
* GPU - Px : time needed to compute the Px vector
* GPU - W0 : time needed to compute the W0 matrix
* CPU - K : time needed to compute the kernels

The last step can only be done on CPU as it won't be much faster on GPUs.
The code then output the values of the kernel calculated for the pair that was tested. If a valid .mat file containing the matlab precomputed kernel was found (typically ./kernelMAT/K_testID.mat), the code will also output these values for comparison.

# Kernel Tuner

The performance of the GPU code depends a lot on the number of threads and block size used. We can determine the best block size using the kernel tuner. You can tune the gpu block/grid size using the kernel tuner. Simply type:

```
graphRank --tune_kernel [--func=<func_name>]
```

If you don't specify a function name (present in cuda_kernel.c) the code will tune all the functions. For each function it should output something like:

```
Tuning function create_kron_mat from ./cuda_kernel.c
----------------------------------------
Using: GeForce GTX 1080 Ti
block_size_x=2, block_size_y=2, time=0.905830395222
block_size_x=2, block_size_y=4, time=0.545791995525
block_size_x=2, block_size_y=8, time=0.355219191313
block_size_x=2, block_size_y=16, time=0.30387840271
block_size_x=2, block_size_y=32, time=0.27014400363
block_size_x=2, block_size_y=64, time=0.259091204405
block_size_x=2, block_size_y=128, time=0.250815996528
......
best performing configuration: block_size_x=8, block_size_y=8, time=0.161958396435
```

# Run

You can run the calculation on the entire training/test set using

```
graphRank [--cuda] [--lamb=X] [--walk=X] [--outfile=name] [--gpu_block=i j k]
```

In the GPU case the code will first output the timing of the kernel compilation and GPU memory assignement. Once again these two steps are needed to be done only once.

```
GPU - Kern : 0.106779
GPU - Mem : 0.146905
```


Then for each pair of graph present in the train/test set the code will output the following

```
7CEI_100w 4CPA
--------------------
GPU - Mem : 0.001109 (block size:8x8)
GPU - Kron : 0.002521 (block size:8x8)
GPU - Px : 0.001092 (block size:8x8)
GPU - W0 : 0.001091 (block size:8x8)
CPU - K : 0.000621
--------------------
K : 0.000245 0.000402 0.00117 0.00166 0.00445
Kcheck : 0.000245 0.000402 0.00117 0.00166 0.00445
```

As you can see if a check file (typically ./kernelMAT/K_testID.mat) is found it will also compare the values of the matlab code with the one calculated here.


# Results

After the run the results will be dumped in a pickle file with default name kernel.pkl. You can read this file following

```python
import pickle
fname = kernel.pkl
K = pickle.load(open(fname,'rb'))
```

K is then a dictionary with the following keys:

```
K['lambda'] : lambda value used for the calculation
K['walk'] : walk length used for the calculation
K['cuda'] : was cuda used during the calcultion (useful ?)
K['gpu_block'] : the gpu block size during the calculation (useful ?)
K[(MOL1,MOL2)] : the values of the kernel calculated for this specific pair
K[(MOL1,MOL3)] : the values of the kernel calculated for this specific pair
K[(MOL2,MOL3)] : the values of the kernel calculated for this specific pair
....
```

Using this results you can compare the python and matlab kernel values using the following script

```python
import matplotlib.pyplot as plt
import scipy.io as spio
import pickle

# matlab kernel file
matlab = './kernelMAT/K_smalltestID.mat'

# python kernel file
python = './kernel.pkl'

# load the data
Kcheck = spio.loadmat(matlab)['K']
K = pickle.load(open(python,'rb'))

# plot the data
N = len(Kcheck)
keys = list(K.keys())[4:]
k = 0
for n1 in range(N):
M = len(Kcheck[n1])
for n2 in range(M):
plt.scatter(Kcheck[n1][n2],K[keys[k]])
k +=1
plt.show()
```
56 changes: 56 additions & 0 deletions bin/iScore.compute
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
#!/usr/bin/env python
from iScore.score_graph import ScoreGraph
import argparse

# parse arguments
parser = argparse.ArgumentParser(description=' iScore - score graphs')

# test and train IDS
parser.add_argument('--testID', type=str, default='testID.lst',help='list of ID for testing. Default: testID.lst')
parser.add_argument('--trainID', type=str, default='trainID.lst',help='list of ID for training. Default: trainID.lst')

# graphs of the individual complex
parser.add_argument('--graph',type=str,default='graph',help='folder containing the graph of each complex. Default: graphMAT')

# file containing the kernel for checking
parser.add_argument('--check',type=str,default=None,help='file containing the kernel. Default: kernelMAT/<testID_name>.mat')

# where to write the output file
parser.add_argument('--outfile',type=str,default='kernel.pkl',help='Output file containing the calculated Kernel values. Default: kernel.pkl')

# what to do: tune the kernel, test the calculation, run the entire calculations
parser.add_argument('--tune_kernel',action='store_true',help='Only tune the CUDA kernel')
parser.add_argument('--test',action='store_true',help='Only test the functions on a single pair pair of graph ')

# parameter of the calculations
parser.add_argument('--lamb',type=float,default=1,help='Lambda parameter in the Kernel calculations. Default: 1')
parser.add_argument('--walk',type=int,default=4,help='Max walk length in the Kernel calculations. Default: 4')
parser.add_argument('--method',type=str,default='vect',help="Method used in the calculation: 'vect'(default), 'combvec', 'iter'")

# cuda parameters
parser.add_argument('--func',type=str,default='all',help='functions to tune in the kernel. Defaut: all functions')
parser.add_argument('--cuda',action='store_true', help='Use CUDA kernel')
parser.add_argument('--gpu_block',nargs='+',default=[8,8,1],type=int,help='number of gpu block to use. Default: 8 8 1')

args = parser.parse_args()

# init and load the data
GR = ScoreGraph(testIDs=args.testID,trainIDs=args.trainID,graph_path=args.graph,gpu_block=tuple(args.gpu_block),method=args.method)
GR.import_from_mat()

# get the path of the check file
checkfile = GR.get_check_file(args.check)

# only tune the kernel
if args.tune_kernel:
GR.tune_kernel(func=args.func,test_all_func=args.func=='all')

# run the entire calculation
else :
GR.run(lamb=args.lamb,
walk=args.walk,
outfile=args.outfile,
cuda=args.cuda,
gpu_block=tuple(args.gpu_block),
check=checkfile,
test=args.test)
78 changes: 78 additions & 0 deletions bin/iScore.generate
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
#!/usr/bin/env python
import os
from iScore.generate_graph import GenGraph
import argparse



parser = argparse.ArgumentParser()
parser.add_argument('--pdb_path', type = str, default='./pdb',help='path where to find the PDB files')
parser.add_argument('--pssm_path', type = str, default='./pssm',help='path where to find the PSSM files')
parser.add_argument('--select', type = str, default=None,help='File containing the name of the pdb to process')
parser.add_argument('--outdir',type = str, default='./graph/',help='Directory where to store the graphs')
parser.add_argument('--aligned',action='store_true',help='PSSM and PDB are aligned')
args = parser.parse_args()

# make sure that the dir containing the PDBs exists
if not os.path.isdir(args.pdb_path):
raise NotADirectoryError(args.pdb_path + ' is not a directory')
else:
pdb_files = os.listdir(args.pdb_path)

# make sure that the dir containing the PSSMs exists
if not os.path.isdir(args.pssm_path):
raise NotADirectoryError(args.pssm_path + ' is not a directory')
else:
pssm_files = os.listdir(args.pssm_path)

# check if we want to select a subset of PDBs
if args.select is not None:
if not os.path.isfile(args.select):
raise FileNotFoundError(args.select + ' is not a file')
else:
with open(args.select,'r') as f:
select = f.readlines()
else:
select = None

# get the list of PDB names
pdbs = list(filter(lambda x: x.endswith('.pdb'),os.listdir(args.pdb_path)))
if select is not None:
pdbs = list(filter(lambda x: x.startswith(select),pdbs))

# create the output file
if not os.path.isdir(args.outdir):
os.mkdir(args.outdir)

# loop over all the PDBs
for name in pdbs:

print('Creating graph of PDB %s' %name)

# pdb name
pdbfile = os.path.join(args.pdb_path,name)

# mol name and base name
mol_name = os.path.splitext(name)[0]
base_name = mol_name.split('_')[0]

# pssms files
pssmA = os.path.join(args.pssm_path,mol_name+'.A.pdb.pssm')
pssmB = os.path.join(args.pssm_path,mol_name+'.B.pdb.pssm')

# check if the pssms exists
if os.path.isfile(pssmA) and os.path.isfile(pssmB):
pssm = {'A':pssmA,'B':pssmB}
else:
raise FileNotFoundError(pssmA + ' or ' + pssmB + ' not found')


# output file
graphfile = os.path.join(args.outdir+mol_name+'.pckl')

# create the graphs
gen = GenGraph(pdbfile,pssm,aligned=args.aligned,outname=graphfile)




Empty file added iScore/__init__.py
Empty file.
Binary file added iScore/__pycache__/__init__.cpython-36.pyc
Binary file not shown.
Binary file added iScore/__pycache__/generate_graph.cpython-36.pyc
Binary file not shown.
Binary file added iScore/__pycache__/graph.cpython-36.pyc
Binary file not shown.
Binary file added iScore/__pycache__/graphCreate.cpython-36.pyc
Binary file not shown.
Binary file added iScore/__pycache__/score_graph.cpython-36.pyc
Binary file not shown.
Loading

0 comments on commit 927513d

Please sign in to comment.