Skip to content

Commit

Permalink
make rd_filter as a python package
Browse files Browse the repository at this point in the history
  • Loading branch information
hsiaoyi0504 committed Aug 22, 2018
1 parent d56bc4c commit a2e496c
Show file tree
Hide file tree
Showing 7 changed files with 78 additions and 60 deletions.
1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
graft rd_filters
106 changes: 53 additions & 53 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#### rd_filters.py
# rd_filters

This script provides a simple means of applying the
functional group filters from the ChEMBL database, as well as a number of
Expand All @@ -21,43 +21,51 @@ documentation on the different alert sets.
The SMARTS patterns in a number of these alerts were not compatible with the RDKit
so I edited them. A complete list of the changes I made is in the file **Notes.txt**.

#### Installation
## Prerequisite

This script has a few requirements
* At least Python 3.6
* The RDKit, you can find installation instructions [here](https://www.rdkit.org/docs/Install.html). I'd recommend the conda route.
* The pandas and docopt libraries
```python
pip install pandas docopt
* The RDKit, you can find installation instructions [here](https://www.rdkit.org/docs/Install.html). I'd recommend the conda route.

## Installation

### Directly install from github

`pip install git+https://github.com/PatWalters/rd_filters.git`

### Local install

``` shell
git clone https://github.com/PatWalters/rd_filters
cd rd_filters
pip install .
```
The script needs 3 files to operate.
* rd_filters.py - the main Python script.

## Usage

The script needs 2 files to operate.

* alert_collection.csv - the set of structural alerts
* rules.json - the configuration file

The script uses the following logic to find alert_collection.csv and
rules.json.
1. Use locations specified by the "--alert" (for alerts.csv) and
"--rules" (for rules.json) command line arguments.
2. Look in the current directory.
3. Look in the directory pointed to by the FILTER_RULES_DATA environment variable.
The script uses the following logic to find alert_collection.csv and rules.json.

1. Use locations specified by the "--alert" (for alerts.csv) and "--rules" (for rules.json) command line arguments.
2. Look in the current directory.
3. Look in the directory pointed to by the FILTER_RULES_DATA environment variable.

I'll provide some examples below to illustrate.


That's it, at this point you should be good to go.

#### Configuration files
### Configuration files

The file **alert_collection.csv** contains alerts. You shouldn't have to mess with this unless you
want to add your own structural alerts. I think the format is pretty obvious.
want to add your own structural alerts. I think the format is pretty obvious.

The file **rules.json** controls which filters and alerts are used. You can use the command
The file **rules.json** controls which filters and alerts are used. You can use the command
below to generate a **rules.json** with the default settings.
```
rd_filters.py template --out rules.json
```

`rd_filters template --out rules.json`

The **rules.json** file looks like this. The values for the properties are the maximum and minimum
allowed (inclusive). To set which structural alerts are used, set **true** and **false**. You can
Expand Down Expand Up @@ -95,28 +103,27 @@ Just edit the file with your [favorite text editor](https://www.gnu.org/software
200
]
}
```
```

#### Examples

First off, you're going to want to copy **alert_collection.csv** and
**rules.json** to a directory and set the FILTER_RULES_DATA environment
variable to point to that directory. If you are using a bash-ish shell
and the files are in /home/elvis/data that would be:
```
export FILTER_RULES_DATA=/home/elvis/data
and the files are in /home/elvis/data that would be:

```
`export FILTER_RULES_DATA=/home/elvis/data`

If you type
```
rd_filters.py -h
```
you'll see this:
```

`rd_filters -h`

you'll see this:

``` shell
Usage:
rd_filters.py filter --in INPUT_FILE --prefix PREFIX [--rules RULES_FILE_NAME] [--alerts ALERT_FILE_NAME][--np NUM_CORES]
rd_filters.py template --out TEMPLATE_FILE [--rules RULES_FILE_NAME]
rd_filters filter --in INPUT_FILE --prefix PREFIX [--rules RULES_FILE_NAME] [--alerts ALERT_FILE_NAME][--np NUM_CORES]
rd_filters template --out TEMPLATE_FILE [--rules RULES_FILE_NAME]

Options:
--in INPUT_FILE input file name
Expand All @@ -131,9 +138,8 @@ Options:
The basic operation is pretty simple. If I want to filter a file called test.smi and
I want my output files to start with "out", I could do something like this:
```
rd_filters.py filter --in test.smi --prefix out
```
`rd_filters filter --in test.smi --prefix out`
This will create 2 files
* **out.smi** - contains the SMILES strings and molecule names for all of the compounds
passing the filters
Expand All @@ -142,28 +148,22 @@ by a molecule
By default, this script runs in parallel and uses all available processors. To
change this value, use the --np flag.
```
rd_filters.py filter --in test.smi --prefix out --np 4
```
`rd_filters filter --in test.smi --prefix out --np 4`
As mentioned above, alternate rules files or alerts files can be specified on
the command line.
```
rd_filters.py filter --in test.smi --prefix out --rules myrules.json
rd_filters.py filter --in test.smi --prefix out --alerts myalerts.csv
rd_filters.py filter --in test.smi --prefix out --rules myrules.json --alerts myalerts.csv
```
A new default rules template file can be generated using the **template** option.
```
rdfilters.py template --out myrules.json
``` shell
rd_filters filter --in test.smi --prefix out --rules myrules.json
rd_filters filter --in test.smi --prefix out --alerts myalerts.csv
rd_filters filter --in test.smi --prefix out --rules myrules.json --alerts myalerts.csv
```
As always please let me know if you have questions, comments, etc.

Pat Walters, August 2018


A new default rules template file can be generated using the **template** option.
`rdfilters.py template --out myrules.json`
As always please let me know if you have questions, comments, etc.
Pat Walters, August 2018
Empty file added rd_filters/__init__.py
Empty file.
File renamed without changes.
File renamed without changes.
16 changes: 9 additions & 7 deletions rd_filters.py → rd_filters/rd_filters.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,12 @@
import os
import json
from docopt import docopt
import pkg_resources


cmd_str = """Usage:
rd_filters.py filter --in INPUT_FILE --prefix PREFIX [--rules RULES_FILE_NAME] [--alerts ALERT_FILE_NAME][--np NUM_CORES]
rd_filters.py template --out TEMPLATE_FILE [--rules RULES_FILE_NAME]
rd_filters filter --in INPUT_FILE --prefix PREFIX [--rules RULES_FILE_NAME] [--alerts ALERT_FILE_NAME][--np NUM_CORES]
rd_filters template --out TEMPLATE_FILE [--rules RULES_FILE_NAME]
Options:
--in INPUT_FILE input file name
Expand Down Expand Up @@ -77,7 +79,7 @@ def default_rule_template(alert_list, file_name):

def get_config_file(file_name, environment_variable):
"""
Read a configuration file, first look for the file in the current directory, if you can't find
Read a configuration file, first look for the file, if you can't find
it there, look in the directory pointed to by environment_variable
:param file_name: the configuration file
:param environment_variable: the environment variable
Expand All @@ -92,12 +94,12 @@ def get_config_file(file_name, environment_variable):
if os.path.exists(config_file_path):
return config_file_path

error_list = [f"Could not file {file_name} in the current directory"]
error_list = [f"Could not file {file_name}"]
if config_dir:
err_str = f"Could not find {config_file_path} based on the {environment_variable}" + \
"environment variable"
error_list.append(err_str)
error_list.append(f"Please put {file_name} in the current directory")
error_list.append(f"Please check {file_name} exists")
error_list.append(f"Or in the directory pointed to by the {environment_variable} environment variable")
print("\n".join(error_list))
sys.exit(1)
Expand Down Expand Up @@ -152,7 +154,7 @@ def evaluate(self, lst_in):

def main():
cmd_input = docopt(cmd_str)
alert_file_name = cmd_input.get("--alerts") or "alert_collection.csv"
alert_file_name = cmd_input.get("--alerts") or pkg_resources.resource_filename('rd_filters', "data/alert_collection.csv")
rf = RDFilters(alert_file_name)

if cmd_input.get("template"):
Expand All @@ -161,7 +163,7 @@ def main():

elif cmd_input.get("filter"):
input_file_name = cmd_input.get("--in")
rules_file_name = cmd_input.get("--rules") or "rules.json"
rules_file_name = cmd_input.get("--rules") or pkg_resources.resource_filename('rd_filters', "data/rules.json")
rules_file_path = get_config_file(rules_file_name, "FILTER_RULES_DATA")
prefix_name = cmd_input.get("--prefix")
num_cores = cmd_input.get("--np") or mp.cpu_count()
Expand Down
15 changes: 15 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
from setuptools import setup, find_packages


setup(
name="rd_filters",
version="0.1",
packages=find_packages(),
entry_points={
'console_scripts': [
'rd_filters=rd_filters.rd_filters:main',
],
},
install_requires=['pandas', 'docopt'],
include_package_data=True,
)

0 comments on commit a2e496c

Please sign in to comment.