Skip to content

Commit

Permalink
calc_command.py replaced calc_command.json, added function support
Browse files Browse the repository at this point in the history
  • Loading branch information
Ian Pendleton committed May 23, 2020
1 parent 515c84b commit ce24f5a
Show file tree
Hide file tree
Showing 10 changed files with 162 additions and 85 deletions.
9 changes: 8 additions & 1 deletion HISTORY.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,15 @@
RELEASE HISTORY
===============
### Be sure to update the version number in 'runme.py'!
1.11 (2020-05-22)
----------------
* Updated `_calc_` docstrings and user docs
* calc_command.json shifted to ./utils/calc_command.py
* `_calc_` can now evaluate simple functions, imports are handled through calc_command.py
* Shifting all USER level information to [google user doc](https://docs.google.com/document/d/1RQJvAlDVIfu19Tea23dLUSymLabGfwJtDnZwANtU05s/edit#)
* Shifting [DEV level information to wiki](https://github.com/darkreactions/ESCALATE_Capture/wiki)

1.1 (2020-05-20)
1.10 (2020-05-20)
-----------------
* Added calc_command.json support
* Streamlined feature specification
Expand Down
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@

**Technical Debugging:** vshekar .at. haverford.edu, gcattabrig .at. haverford.edu,

## [FAQs](https://github.com/darkreactions/ESCALATE_Capture/wiki/Users:-FAQs)
## [Wiki](https://github.com/darkreactions/ESCALATE_Capture/wiki)
## [FAQs](https://docs.google.com/document/d/1RQJvAlDVIfu19Tea23dLUSymLabGfwJtDnZwANtU05s/edit#bookmark=id.8sg0qwagd7yw)
## [Developer Wiki](https://github.com/darkreactions/ESCALATE_Capture/wiki)

Overview
=================
Expand Down Expand Up @@ -73,7 +73,7 @@ Please report any failures of the above message to the repo admins

1. Download the [securekey files](https://www.youtube.com/watch?v=oHg5SJYRHA0) and move them into the root folder (`./`, aka. current working directory, aka. `ESCALATE_report-master/` if downloaded from git). Do not distribute these keys! (Contact a dev for access)

Note: If setting up a new lab see [here](https://github.com/darkreactions/ESCALATE_Capture/wiki/Developers:--ONBOARDING-LABS:--Capture-and-Report)
Note: [Navigate to the wiki for more information on setting up a new lab or generating additional authentication keys](https://github.com/darkreactions/ESCALATE_Capture/wiki/Developers:--ONBOARDING-LABS:--Capture-and-Report)

2. Ensure that the files 'client_secrets.json' and 'creds.json' are both present in the root folder (`./`, aka. current working directory, aka. `ESCALATE_report-master/` if downloaded from git). The correct folder for these keys is the one which contains the runme.py script.

Expand Down Expand Up @@ -125,7 +125,7 @@ __Definitions__
* To see all columns with naming directly from datasource use: `--raw 1`
* __Conflicting namespaces will be purged!__

4. Significant flexibility is enabled for `_feat_` (via, type_command.csv) and `_calc_` (via, calc_command.json) specification. [For examples, discussion, and limitations of these specifications please see the USER docs.](https://docs.google.com/document/d/1RQJvAlDVIfu19Tea23dLUSymLabGfwJtDnZwANtU05s/edit#bookmark=id.1shd7vj8nkv8)
4. Significant flexibility is enabled for `_feat_` (via, type_command.csv) and `_calc_` (via, ./utils/calc_command.py) specification. [For examples, discussion, and limitations of these specifications please see the USER docs.](https://docs.google.com/document/d/1RQJvAlDVIfu19Tea23dLUSymLabGfwJtDnZwANtU05s/edit#bookmark=id.1shd7vj8nkv8)
* `_calc_` generation can be skipped by using the `--disablecalcs True` flag on the CLI
* To speed up calc and feature development the first portion of the code can be skipped by:
1. Running the code with `--offline 1`
Expand All @@ -137,7 +137,7 @@ __Definitions__

`python runme.py <my_local_folder> -d <google_drive_target_name> --debug 1`

To add additional target directories please see the how-to guide [here](https://github.com/darkreactions/ESCALATE_Capture/wiki/Developers:-Adding-New-Labs-to-devconfig.py). If you would like to add these to the existing datasets, please issue a git merge request after you add the necessary information.
To add additional target directories please see the how-to guide [here](https://github.com/darkreactions/ESCALATE_Capture/wiki/Developers:--ONBOARDING-LABS:--Capture-and-Report). If you would like to add these to the existing datasets, please issue a git merge request after you add the necessary information.

## Report to Versioned Data to ESCALATion
More detailed instructions can be found in the [ESCALATE user manual](https://docs.google.com/document/d/1RQJvAlDVIfu19Tea23dLUSymLabGfwJtDnZwANtU05s/edit?usp=sharing).
Expand Down
53 changes: 0 additions & 53 deletions calc_command.json

This file was deleted.

11 changes: 6 additions & 5 deletions expworkup/devconfig.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
cwd = os.getcwd()
#######################################
# Version Control
RoboVersion = 2.59
RoboVersion = 2.60
######################################
# Sampler Selection
sampler = 'wolfram' # options are 'default' or 'wolfram'
Expand Down Expand Up @@ -160,17 +160,18 @@
wolfram_kernel_path = None # ensure the value can be imported on all computers.

# we only need to do this check if the user wants wolfram in the first place
linux_path = '/usr/local/Wolfram/WolframEngine/12.1/Executables/WolframKernel'
if sampler == 'wolfram':
if system == "Linux":
wolfram_kernel_path = None
# try first path location
wolfram_kernel = Path('/usr/local/Wolfram/WolframEngine/12.0/Executables/WolframKernel')
wolfram_kernel = Path(linux_path)
if wolfram_kernel.is_file():
wolfram_kernel_path = "/usr/local/Wolfram/WolframEngine/12.0/Executables/WolframKernel"
wolfram_kernel_path = linux_path
# try second path location
wolfram_kernel_2 = Path('/usr/local/Wolfram/Mathematica/12.0/Executables/WolframKernel')
wolfram_kernel_2 = Path(linux_path)
if wolfram_kernel_2.is_file():
wolfram_kernel_path = '/usr/local/Wolfram/Mathematica/12.0/Executables/WolframKernel'
wolfram_kernel_path = linux_path
if wolfram_kernel_path is None:
# is this allowed? maybe can do in a cleaner way but nice to automate this instead of just killing
print('Warning: WolframKernel not successfully found, falling back on default')
Expand Down
45 changes: 28 additions & 17 deletions expworkup/handlers/calcs.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,11 @@

from utils.globals import compound_ingredient_chemical_return
from utils.file_handling import write_debug_file
from utils.calc_command import CALC_COMMAND_DICT

modlog = logging.getLogger(f'mainlog.{__name__}')
warnlog = logging.getLogger(f'warning.{__name__}')

global CALC_COMMAND_JSON
CALC_COMMAND_JSON = './calc_command.json'

def get_mmol_df(reagent_volumes_df,
object_df,
chemical_count,
Expand Down Expand Up @@ -83,7 +81,7 @@ def all_ratios(df, fill_value, prefix):
df2 = df2.replace([np.inf, -np.inf, np.nan], fill_value)
return(df2)

def df_simple_eval(command, variables, x):
def df_simple_eval(command, variables, x, command_function=None):
""" Performs safe evals on dataframe
Uses specified command with variable mapping onto x to generate numerical
Expand Down Expand Up @@ -115,7 +113,7 @@ def df_simple_eval(command, variables, x):
df_referenced_dict = {}
for variable_name in variables.keys():
df_referenced_dict[variable_name] = x[variables[variable_name]]
out_value = simple_eval(command, names=df_referenced_dict)
out_value = simple_eval(command, names=df_referenced_dict, functions=command_function)
return out_value

def evaluation_pipeline(all_targets, debug_bool):
Expand All @@ -137,11 +135,7 @@ def evaluation_pipeline(all_targets, debug_bool):
calc_df = pd.DataFrame()
calc_df['name'] = all_targets.index
calc_df.set_index('name', inplace=True)

# ORDER MATTERS, we might want to build as we go..
with open(CALC_COMMAND_JSON, 'r') as f:
eval_dict = json.load(f, object_pairs_hook=OrderedDict)
f.close()
eval_dict = CALC_COMMAND_DICT

for entry_name in eval_dict.keys():
header_name = entry_name
Expand All @@ -157,19 +151,30 @@ def evaluation_pipeline(all_targets, debug_bool):
else:
# We don't want the code to bomb out due to
# all_targets not containing the specified headers,
if not set(variables.values()).issubset(all_targets.columns):
modlog.warn(f"For {entry_name}, columns specified were not found! Please correct!")
warnlog.warn(f"For {entry_name}, columns specified were not found! Please correct!")
else:
run_function = True
for x in variables.values():
if isinstance(x, str):
if not set(variables.values()).issubset(all_targets.columns):
modlog.warn(f"For {entry_name}, columns specified were not found! Please correct!")
warnlog.warn(f"For {entry_name}, columns specified were not found! Please correct!")
run_function = False
# Handle nested lists
elif isinstance(x, list):
if not set(x).issubset(all_targets.columns):
modlog.warn(f"For {entry_name}, columns specified were not found! Please correct!")
warnlog.warn(f"For {entry_name}, columns specified were not found! Please correct!")
run_function = False
if run_function:
fill_value = eval_dict[entry_name].get('fill_value', 'null')
description = eval_dict[entry_name].get('description', 'null')
specified_command = eval_dict[entry_name].get('functions', None)
if fill_value == 'null':
modlog.info(f'For {entry_name}, "fill_value" was set to a default of "null"')
if description == 'null':
modlog.info(f'For {entry_name}, "description" was set to a default of "null"')

try:
value_column = all_targets.apply(lambda x: df_simple_eval(command, variables, x), axis=1)
value_column = all_targets.apply(lambda x: df_simple_eval(command, variables, x, command_function=specified_command), axis=1)
except SyntaxError:
modlog.warn(f'For "{entry_name}", simpleeval failed to resolve the specified command, please check specification, or debug code!')
warnlog.warn(f'For "{entry_name}", simpleeval failed to resolve the specified command, please check specification, or debug code!')
Expand All @@ -181,11 +186,17 @@ def evaluation_pipeline(all_targets, debug_bool):
if debug_bool:
debug_df = value_column.copy().to_frame()
for key, value in variables.items():
debug_df[key] = all_targets[[value]]
try:
debug_df[key] = all_targets[[value]]
except KeyError:
warnstring = f'Nested function used in calcs, will not export all columns'
modlog.warn(warnstring)
debug_df['warn'] = warnstring
pass
debug_df['variables'] = str(variables)
debug_df['command'] = command
debug_df['description'] = description
calc_file = f'CALC_{entry_name.upper()}.csv'
calc_file = f'{entry_name.upper()}.csv'
write_debug_file(debug_df, calc_file)

calc_df = calc_df.join(value_column)
Expand Down
2 changes: 1 addition & 1 deletion expworkup/report_view.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ def construct_2d_view(report_df,
columns are the ratio headers e.g. '_calc_ratio_acid_molarity_inorganic_molarity'
calcs_df : pd.DataFrame
completed _calcs_ specified by the calc_command.json file
completed _calcs_ specified by the ./utils/calc_command.py file
indexed on runUID ('name')
columns are the values return from _calcs_
Expand Down
4 changes: 2 additions & 2 deletions runme.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
set_debug_simple, get_target_folder, get_log_folder, get_offline_folder
)

__version__ = 1.1 #should match latest HISTORY.md entry
__version__ = 1.11 #should match latest HISTORY.md entry

def initialize(args):
''' Refreshes working environment - logs initialization
Expand Down Expand Up @@ -293,7 +293,7 @@ def parse_args(args):
help='final dataframe is printed with all raw values\
included ||default = False||')
parser.add_argument('--disablecalcs', type=bool, default=False, choices=[True, False],
help='if True, diasables escalate calculations (calc_command.json) ||default = False||')
help='if True, diasables escalate calculations specified in ./utils/calc_command.json ||default = False||')
parser.add_argument('--debug', type=bool, default=False, choices=[True, False],
help="exports all dataframe intermediates prefixed with 'REPORT_'\
csvfiles with default names")
Expand Down
2 changes: 1 addition & 1 deletion type_command.csv
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,6 @@ fr_guanido,fr_guanido,number of guanidine groups,organic,,RDKit open source soft
fr_dihydropyridine,fr_dihydropyridine,number of dihydropyridines,organic,,RDKit open source software,RDKit,19.03.4,
fr_amidine,fr_amidine,number of amidine groups,organic,,RDKit open source software,RDKit,19.03.4,
fr_halogen,fr_halogen,number of halogens,organic,,RDKit open source software,RDKit,19.03.4,
hansentriple,hansentriple,"returns three columns (deltad, deltap, deltat) for each species if present in JS_HansenSolubility external repository",solvent,,escalate,EscalateFeats,1,
hansentriple,hansentriple,"returns three columns (deltad, deltap, deltah) for each species if present in JS_HansenSolubility external repository",solvent,,escalate,EscalateFeats,1,
molarityratio,molarity_ratio,calculates the molarity ratio between the sum of all chemicals of the primary chemical types (first type reference in inventory) named: `_calc_ratio_<type>_<type>_molarity` -- column name structure is fixed and cannot be changed (alternative_input dictates the value used to fill blanks and infinities after calculating numericals),all,0,escalate,EscalateCalcs,1,exists in this table so that it can be toggled without changing pipeline code -- removing will disable
molarityratio_bytype,type_molarity_ratio,calculates the molarity ratio between all unique members which have a given chemical type in the chemical inventory -- column name: ## and %% are instances of a pariticular type -- e.g. `_calc_ratiobytype_organic_0_molarity...` would refer to identity in the _raw_organic_0_inchikey -- column name structure is fixed and cannot be changed (alternative_input dictates the value used to fill blanks and infinities after calculating numericals),all,0,escalate,EscalateCalcs,1,exists in this table so that it can be toggled without changing pipeline code -- removing will disable
2 changes: 2 additions & 0 deletions utils/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Utils Description

Loading

0 comments on commit ce24f5a

Please sign in to comment.