calc_command.py replaced calc_command.json, added function support

darkreactions · May 23, 2020 · ce24f5a · ce24f5a
1 parent 515c84b
commit ce24f5a
Show file tree

Hide file tree

Showing 10 changed files with 162 additions and 85 deletions.
diff --git a/HISTORY.md b/HISTORY.md
@@ -1,8 +1,15 @@
 RELEASE HISTORY
 ===============
 ### Be sure to update the version number in 'runme.py'!
+1.11 (2020-05-22)
+----------------
+  * Updated `_calc_` docstrings and user docs
+  * calc_command.json shifted to ./utils/calc_command.py
+  * `_calc_` can now evaluate simple functions, imports are handled through calc_command.py
+  * Shifting all USER level information to [google user doc](https://docs.google.com/document/d/1RQJvAlDVIfu19Tea23dLUSymLabGfwJtDnZwANtU05s/edit#)
+  * Shifting [DEV level information to wiki](https://github.com/darkreactions/ESCALATE_Capture/wiki)
 
-1.1 (2020-05-20)
+1.10 (2020-05-20)
 -----------------
   * Added calc_command.json support
   * Streamlined feature specification

diff --git a/README.md b/README.md
@@ -4,8 +4,8 @@
 
 **Technical Debugging:** vshekar .at. haverford.edu, gcattabrig .at. haverford.edu, 
 
-## [FAQs](https://github.com/darkreactions/ESCALATE_Capture/wiki/Users:-FAQs)
-## [Wiki](https://github.com/darkreactions/ESCALATE_Capture/wiki)
+## [FAQs](https://docs.google.com/document/d/1RQJvAlDVIfu19Tea23dLUSymLabGfwJtDnZwANtU05s/edit#bookmark=id.8sg0qwagd7yw)
+## [Developer Wiki](https://github.com/darkreactions/ESCALATE_Capture/wiki)
 
 Overview
 =================
@@ -73,7 +73,7 @@ Please report any failures of the above message to the repo admins
 
 1. Download the [securekey files](https://www.youtube.com/watch?v=oHg5SJYRHA0) and move them into the root folder (`./`, aka. current working directory, aka. `ESCALATE_report-master/` if downloaded from git). Do not distribute these keys! (Contact a dev for access)
 
-   Note: If setting up a new lab see [here](https://github.com/darkreactions/ESCALATE_Capture/wiki/Developers:--ONBOARDING-LABS:--Capture-and-Report)
+   Note: [Navigate to the wiki for more information on setting up a new lab or generating additional authentication keys](https://github.com/darkreactions/ESCALATE_Capture/wiki/Developers:--ONBOARDING-LABS:--Capture-and-Report)
 
 2. Ensure that the files 'client_secrets.json' and 'creds.json' are both present in the root folder (`./`, aka. current working directory, aka. `ESCALATE_report-master/` if downloaded from git).  The correct folder for these keys is the one which contains the runme.py script.
 
@@ -125,7 +125,7 @@ __Definitions__
    * To see all columns with naming directly from datasource use: `--raw 1`
    * __Conflicting namespaces will be purged!__
 
-4. Significant flexibility is enabled for `_feat_` (via, type_command.csv) and `_calc_` (via, calc_command.json) specification.  [For examples, discussion, and limitations of these specifications please see the USER docs.](https://docs.google.com/document/d/1RQJvAlDVIfu19Tea23dLUSymLabGfwJtDnZwANtU05s/edit#bookmark=id.1shd7vj8nkv8)
+4. Significant flexibility is enabled for `_feat_` (via, type_command.csv) and `_calc_` (via, ./utils/calc_command.py) specification.  [For examples, discussion, and limitations of these specifications please see the USER docs.](https://docs.google.com/document/d/1RQJvAlDVIfu19Tea23dLUSymLabGfwJtDnZwANtU05s/edit#bookmark=id.1shd7vj8nkv8)
    * `_calc_` generation can be skipped by using the `--disablecalcs True` flag on the CLI
    * To speed up calc and feature development the first portion of the code can be skipped by:
       1. Running the code with `--offline 1` 
@@ -137,7 +137,7 @@ __Definitions__
 
    `python runme.py <my_local_folder> -d <google_drive_target_name> --debug 1`
 
-To add additional target directories please see the how-to guide [here](https://github.com/darkreactions/ESCALATE_Capture/wiki/Developers:-Adding-New-Labs-to-devconfig.py).  If you would like to add these to the existing datasets, please issue a git merge request after you add the necessary information.
+To add additional target directories please see the how-to guide [here](https://github.com/darkreactions/ESCALATE_Capture/wiki/Developers:--ONBOARDING-LABS:--Capture-and-Report).  If you would like to add these to the existing datasets, please issue a git merge request after you add the necessary information.
 
 ## Report to Versioned Data to ESCALATion
 More detailed instructions can be found in the [ESCALATE user manual](https://docs.google.com/document/d/1RQJvAlDVIfu19Tea23dLUSymLabGfwJtDnZwANtU05s/edit?usp=sharing).

diff --git a/calc_command.json b/calc_command.json
diff --git a/expworkup/devconfig.py b/expworkup/devconfig.py
@@ -6,7 +6,7 @@
 cwd = os.getcwd()
 #######################################
 # Version Control
-RoboVersion = 2.59
+RoboVersion = 2.60
 ######################################
 # Sampler Selection
 sampler = 'wolfram' # options are 'default' or 'wolfram'
@@ -160,17 +160,18 @@
 wolfram_kernel_path = None # ensure the value can be imported on all computers.
 
 # we only need to do this check if the user wants wolfram in the first place
+linux_path = '/usr/local/Wolfram/WolframEngine/12.1/Executables/WolframKernel'
 if sampler == 'wolfram': 
     if system == "Linux":
         wolfram_kernel_path = None
         # try first path location
-        wolfram_kernel = Path('/usr/local/Wolfram/WolframEngine/12.0/Executables/WolframKernel')
+        wolfram_kernel = Path(linux_path)
         if wolfram_kernel.is_file():
-            wolfram_kernel_path = "/usr/local/Wolfram/WolframEngine/12.0/Executables/WolframKernel"
+            wolfram_kernel_path = linux_path
         # try second path location
-        wolfram_kernel_2 = Path('/usr/local/Wolfram/Mathematica/12.0/Executables/WolframKernel')
+        wolfram_kernel_2 = Path(linux_path)
         if wolfram_kernel_2.is_file():
-            wolfram_kernel_path = '/usr/local/Wolfram/Mathematica/12.0/Executables/WolframKernel'
+            wolfram_kernel_path = linux_path
         if wolfram_kernel_path is None:
             # is this allowed? maybe can do in a cleaner way but nice to automate this instead of just killing
             print('Warning: WolframKernel not successfully found, falling back on default')

diff --git a/expworkup/handlers/calcs.py b/expworkup/handlers/calcs.py
@@ -9,13 +9,11 @@
 
 from utils.globals import compound_ingredient_chemical_return
 from utils.file_handling import write_debug_file
+from utils.calc_command import CALC_COMMAND_DICT
 
 modlog = logging.getLogger(f'mainlog.{__name__}')
 warnlog = logging.getLogger(f'warning.{__name__}')
 
-global CALC_COMMAND_JSON
-CALC_COMMAND_JSON = './calc_command.json'
-
 def get_mmol_df(reagent_volumes_df, 
                 object_df, 
                 chemical_count, 
@@ -83,7 +81,7 @@ def all_ratios(df, fill_value, prefix):
     df2 = df2.replace([np.inf, -np.inf, np.nan], fill_value)
     return(df2)
 
-def df_simple_eval(command, variables, x):
+def df_simple_eval(command, variables, x, command_function=None):
     """ Performs safe evals on dataframe
 
     Uses specified command with variable mapping onto x to generate numerical 
@@ -115,7 +113,7 @@ def df_simple_eval(command, variables, x):
     df_referenced_dict = {}
     for variable_name in variables.keys():
         df_referenced_dict[variable_name] = x[variables[variable_name]]
-    out_value = simple_eval(command, names=df_referenced_dict)
+    out_value = simple_eval(command, names=df_referenced_dict, functions=command_function)
     return out_value
 
 def evaluation_pipeline(all_targets, debug_bool):
@@ -137,11 +135,7 @@ def evaluation_pipeline(all_targets, debug_bool):
     calc_df = pd.DataFrame()
     calc_df['name'] = all_targets.index
     calc_df.set_index('name', inplace=True)
-
-    # ORDER MATTERS, we might want to build as we go..
-    with open(CALC_COMMAND_JSON, 'r') as f:
-        eval_dict = json.load(f, object_pairs_hook=OrderedDict)
-    f.close()
+    eval_dict = CALC_COMMAND_DICT
 
     for entry_name in eval_dict.keys():
         header_name = entry_name
@@ -157,19 +151,30 @@ def evaluation_pipeline(all_targets, debug_bool):
         else:
             # We don't want the code to bomb out due to 
             # all_targets not containing the specified headers, 
-            if not set(variables.values()).issubset(all_targets.columns):
-                modlog.warn(f"For {entry_name}, columns specified were not found! Please correct!")
-                warnlog.warn(f"For {entry_name}, columns specified were not found! Please correct!")
-            else:
+            run_function = True
+            for x in variables.values():
+                if isinstance(x, str):
+                    if not set(variables.values()).issubset(all_targets.columns):
+                        modlog.warn(f"For {entry_name}, columns specified were not found! Please correct!")
+                        warnlog.warn(f"For {entry_name}, columns specified were not found! Please correct!")
+                        run_function = False
+                # Handle nested lists
+                elif isinstance(x, list):
+                    if not set(x).issubset(all_targets.columns):
+                        modlog.warn(f"For {entry_name}, columns specified were not found! Please correct!")
+                        warnlog.warn(f"For {entry_name}, columns specified were not found! Please correct!")
+                        run_function = False
+            if run_function:
                 fill_value = eval_dict[entry_name].get('fill_value', 'null')
                 description = eval_dict[entry_name].get('description', 'null')
+                specified_command = eval_dict[entry_name].get('functions', None)
                 if fill_value == 'null':
                     modlog.info(f'For {entry_name}, "fill_value" was set to a default of "null"')
                 if description == 'null':
                     modlog.info(f'For {entry_name}, "description" was set to a default of "null"')
 
                 try:
-                    value_column = all_targets.apply(lambda x: df_simple_eval(command, variables, x), axis=1)
+                    value_column = all_targets.apply(lambda x: df_simple_eval(command, variables, x, command_function=specified_command), axis=1)
                 except SyntaxError:
                     modlog.warn(f'For "{entry_name}", simpleeval failed to resolve the specified command, please check specification, or debug code!')        
                     warnlog.warn(f'For "{entry_name}", simpleeval failed to resolve the specified command, please check specification, or debug code!')        
@@ -181,11 +186,17 @@ def evaluation_pipeline(all_targets, debug_bool):
                 if debug_bool:
                     debug_df = value_column.copy().to_frame()
                     for key, value in variables.items():
-                        debug_df[key] = all_targets[[value]]
+                        try:
+                            debug_df[key] = all_targets[[value]]
+                        except KeyError:
+                            warnstring = f'Nested function used in calcs, will not export all columns'
+                            modlog.warn(warnstring)
+                            debug_df['warn'] = warnstring
+                            pass
                     debug_df['variables'] = str(variables)
                     debug_df['command'] = command
                     debug_df['description'] = description
-                    calc_file = f'CALC_{entry_name.upper()}.csv'
+                    calc_file = f'{entry_name.upper()}.csv'
                     write_debug_file(debug_df, calc_file)
 
                 calc_df = calc_df.join(value_column)

diff --git a/expworkup/report_view.py b/expworkup/report_view.py
@@ -38,7 +38,7 @@ def construct_2d_view(report_df,
         columns are the ratio headers e.g. '_calc_ratio_acid_molarity_inorganic_molarity'
     
     calcs_df : pd.DataFrame
-        completed _calcs_ specified by the calc_command.json file
+        completed _calcs_ specified by the ./utils/calc_command.py file
         indexed on runUID ('name')
         columns are the values return from _calcs_
 

diff --git a/runme.py b/runme.py
@@ -27,7 +27,7 @@
     set_debug_simple, get_target_folder, get_log_folder, get_offline_folder
 )
 
-__version__ = 1.1 #should match latest HISTORY.md entry
+__version__ = 1.11 #should match latest HISTORY.md entry
 
 def initialize(args):
     ''' Refreshes working environment - logs initialization
@@ -293,7 +293,7 @@ def parse_args(args):
                         help='final dataframe is printed with all raw values\
                         included ||default = False||')
     parser.add_argument('--disablecalcs', type=bool, default=False, choices=[True, False],
-                        help='if True, diasables escalate calculations (calc_command.json) ||default = False||')
+                        help='if True, diasables escalate calculations specified in ./utils/calc_command.json ||default = False||')
     parser.add_argument('--debug', type=bool, default=False, choices=[True, False],
                         help="exports all dataframe intermediates prefixed with 'REPORT_'\
                         csvfiles with default names")

diff --git a/type_command.csv b/type_command.csv
@@ -69,6 +69,6 @@ fr_guanido,fr_guanido,number of guanidine groups,organic,,RDKit open source soft
 fr_dihydropyridine,fr_dihydropyridine,number of dihydropyridines,organic,,RDKit open source software,RDKit,19.03.4,
 fr_amidine,fr_amidine,number of amidine groups,organic,,RDKit open source software,RDKit,19.03.4,
 fr_halogen,fr_halogen,number of halogens,organic,,RDKit open source software,RDKit,19.03.4,
-hansentriple,hansentriple,"returns three columns (deltad, deltap, deltat) for each species if present in JS_HansenSolubility external repository",solvent,,escalate,EscalateFeats,1,
+hansentriple,hansentriple,"returns three columns (deltad, deltap, deltah) for each species if present in JS_HansenSolubility external repository",solvent,,escalate,EscalateFeats,1,
 molarityratio,molarity_ratio,calculates the molarity ratio between the sum of all chemicals of the primary chemical types (first type reference in inventory) named:  `_calc_ratio_<type>_<type>_molarity` -- column name structure is fixed and cannot be changed (alternative_input dictates the value used to fill blanks and infinities after calculating numericals),all,0,escalate,EscalateCalcs,1,exists in this table so that it can be toggled without changing pipeline code -- removing will disable
 molarityratio_bytype,type_molarity_ratio,calculates the molarity ratio between all unique members which have a given chemical type in the chemical inventory -- column name: ## and %% are instances of a pariticular type -- e.g. `_calc_ratiobytype_organic_0_molarity...` would refer to identity in the  _raw_organic_0_inchikey -- column name structure is fixed and cannot be changed (alternative_input dictates the value used to fill blanks and infinities after calculating numericals),all,0,escalate,EscalateCalcs,1,exists in this table so that it can be toggled without changing pipeline code -- removing will disable
diff --git a/utils/README.md b/utils/README.md
@@ -0,0 +1,2 @@
+# Utils Description
+