Skip to content

Commit

Permalink
20210927 updated
Browse files Browse the repository at this point in the history
  • Loading branch information
renan991995 committed Sep 27, 2021
1 parent a4c1a92 commit 749529f
Show file tree
Hide file tree
Showing 22 changed files with 5,688 additions and 48 deletions.
2 changes: 1 addition & 1 deletion LICENSE.txt
Original file line number Diff line number Diff line change
Expand Up @@ -652,7 +652,7 @@ Also add information on how to contact you by electronic and paper mail.
If the program does terminal interaction, make it output a short
notice like this when it starts in an interactive mode:

<Molecular Data Structure> Copyright (C) <2019> <Hsuan Hao Hsu, Chen Hsuan Huang, and Shiang Tai Lin>
<MARS-PLUS> Copyright (C) <2021> <Chen Hsuan Huang and Shiang Tai Lin>
This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
This is free software, and you are welcome to redistribute it
under certain conditions; type `show c' for details.
Expand Down
161 changes: 121 additions & 40 deletions README.txt
Original file line number Diff line number Diff line change
@@ -1,53 +1,134 @@
1. Introduction
MARS-PLUS, Molecular Assembling and Representation Suite, is a program for
general purpose computer aided molecular design. This program uses an
integer arry data structure to store the information of atom type,
atom connectivity, and bond order of a molecule. There are also
subroutines that can be used to modify the data structure in order to
generate new molecules. This program is espicially useful for
computer molecular design problems.

The source code consists of 7 header files and 6 cpp files: (see src/)
ELEMENTS.h MOLECULE.h CASES_NEU.h CASES_IL_INDEPENDENT.h CASES_IL.h UTILITY.h PARAMETER.h
ELEMENTS.cpp MOLECULE.cpp CASES_NEU.cpp CASES_IL_INDEPENDENT.cpp CASES_IL.cpp UTILITY.cpp main.cpp

There are several input files to start up MARS-PLUS: (see INPUTS/)
INPUTS/control.in : controls the input, output, and calculation options.
INPUTS/ELEMENT_LISTS/element_list.txt : a list that defines base element library.
INPUTS/INPUT_CHEMICALS/IL4.txt : the beginning chemicals.

The calculation results for every operation will be outputted as a file (see LOG_FILES/):
For example, the results of bond change operation on an IL will be outputted to LOG_FILES/change_bnd_IL.txt
########## Introduction to MARS ##########
MARS, Molecular Assembling and Representation Suite, [1] is a computer-aided molecular design (CAMD) [2]
program for general purposes. The program uses five arrays of integers as the molecular data structure (MDS)
to bookkeep a molecular structure (i.e. constituent atoms, molecular connectivity, and formal charge etc.).

Genetic operators (i.e. ring formation, addition, subtraction, exchange, crossover, and combination)
were also developed so that a molecular data structure (MDS) can be used to modify in order to generate new
chemical speciess. MARS has been implemented in computer molecular design problems and has been found helpful. [3]

Refs:
[1] Hsu, H.-H.; Huang, C.-H.; Lin, S.-T., New Data Structure for Computational Molecular Design with Atomic or Fragment Resolution.
J. Chem. Inf. Model. 2019, 59, (9), 3703-3713.
(https://github.com/hsuhsuanhao/MARS)

[2] Austin, N. D.; Sahinidis, N. V.; Trahan, D. W., Computer-aided molecular design: An introduction and review of tools, applications,
and solution techniques. Chem. Eng. Res. Des. 2016, 116, 2-26.

[3] Hsu, H. H.; Huang, C. H.; Lin, S. T., Fully Automated Molecular Design with Atomic Resolution for Desired Thermophysical Properties.
Ind. Eng. Chem. Res. 2018, 57, (29), 9683-9692.



########## Introduction to MARS-PLUS - What's new? ##########
MARS-PLUS is a CAMD program for general purposes.
This program is developed based on the prototype of MARS [1], with various improvements:


=========================================================================
1. The expansion of base element library:
1-1. Group-like elements are allowed now.
1-2. Common neutral atoms, ionic cores, anionic cores are included.

2. The generalization of MDS:
2-1. An extra array of integers is used to bookkeep atomic chirality.
2-2. Two extra arrays of integers are used to bookkeep cis-trans isomerism.
2-3. An extra array of integers is used to bookkeep cyclic bonds.
2-4. Multiple ring numbers on an atom are allowed now.
2-5. The representation of 2-component chemical is allowed now. (1:1 ILs are demonstrated here)

3. The improvements on genetic opertors:
3-1. Refinement of old operators:
3-1-1. The feasibility of molecular connectivity is ensured after subtraction.
3-1-2. Multiple ring numbers on an atom can happen through cyclization operator.
3-1-3. Most of them are greatly revised for a more consistent approach.
3-2. Development of new operators:
insertion, decyclization, element change, cis-trans inversion,
chirality inversion, and component switch.
3-3. Check cis-trans and chirality after genetic operations. (default: trans and non-chiral)

4. The incorporation of Open Babel facilitates the data utilization of designed chemicals.
For example, one can convert a SMILES (outputted from MARS-PLUS) into 3D molecular structure
for ab initio calculations.

5. Development of Open Babel-based SMILES enumerator:
The SMILES enumerator can generate many synonymous SMILES from a given SMILES, and has been
found useful for improving NN-based models. [2]
We provided an Open Babel-based SMILES enumerator as a component of MARS-PLUS (see src/UTILITY.cpp).
We also provided a standalone SMILES enumerator (see src/standalone_SMI_Enum/SMI_Enumerator.cpp)
that can compare the performance differences between Open Babel-based and RDKit-based enumerator.
=========================================================================


The source code consists of 7 header files and 7 cpp files: (see src/)
ELEMENTS.h MOLECULE.h CASES_NEU.h CASES_IL_INDEPENDENT.h CASES_IL.h UTILITY.h PARAMETER.h
ELEMENTS.cpp MOLECULE.cpp CASES_NEU.cpp CASES_IL_INDEPENDENT.cpp CASES_IL.cpp UTILITY.cpp main.cpp

There are several input files to start MARS-PLUS: (see INPUTS/)
INPUTS/control.in : controls the input, output, and calculation options.
INPUTS/ELEMENT_LISTS/element_list.txt : a list that defines base element library.
INPUTS/INPUT_CHEMICALS/IL4.txt : the beginning chemicals.

The calculation results for each of the operations will be outputted as a file (see LOG_FILES/).
For example, the results of bond change operation on an IL will be outputted to LOG_FILES/change_bnd_IL.txt


Refs:
[1] Hsu, H.-H.; Huang, C.-H.; Lin, S.-T., New Data Structure for Computational Molecular Design with Atomic or Fragment Resolution.
J. Chem. Inf. Model. 2019, 59, (9), 3703-3713.
(https://github.com/hsuhsuanhao/MARS)

[2] Bjerrum, E. J., SMILES Enumeration as Data Augmentation for Neural Network Modeling of Molecules. 2017.



########## Developers ##########
This program is developed by Chen-Hsuan Huang and Shiang-Tai Lin ([email protected]).
Computational Molecular Engineering Laboratory
Department of Chemical Engineering, National Taiwan University, Taipei, Taiwan



########## Development environment ##########
Linux CentOS 7
g++ compiler from GNU Compiler Collection v9.2.0 (or any compiler supporting C++11)
Open Babel v3.1.0 (compile from source code)
Cmake v3.15.5
Make v4.2
(*) RDKit 2020_03_1 (Q1 2020) Release
(*) Boost v1.75.0

(*): It does not required for MARS-PLUS (src/UTILITY.cpp),
but is required for standalone SMILES enumerator (src/standalone_SMI_Enum/SMI_Enumerator.cpp).



########## Usage ##########
1. MARS-PLUS
Please read the instructions in INPUTS/control.in and INPUTS/ELEMENT_LISTS/element_list.txt .
Make sure you have properly set the parameters before starting MARS-PLUS.

2. Development environment
Linux CentOS 7
g++ compiler from GNU Compiler Collection v9.2.0 (or any compiler supporting C++11)
Open Babel v3.1.0 (compile from source code)
cmake v3.15.5
make v4.2
cd src/
rm -r ./cmake_install.cmake ./CMakeFiles/ ./CMakeCache.txt 2> /dev/null
cmake ./CMakeLists.txt
make
./MARS-PLUS ./INPUTS/control.in



3. Usage
Please read the instructions in INPUTS/control.in and INPUTS/ELEMENT_LISTS/element_list.txt .
Make sure you have properly set the parameters.
2. Standalone SMILES enumerator
Please read the instructions in src/standalone_SMI_Enum/control.in and src/standalone_SMI_Enum/SMI.txt .
Make sure you have properly set the parameters before starting the standalone SMILES enumerator.

cd src/
rm -r ./cmake_install.cmake ./CMakeFiles/ ./CMakeCache.txt 2> /dev/null
cmake ./CMakeLists.txt
make
./MARS ./INPUTS/control.in
cd src/standalone_SMI_Enum/
rm -r ./cmake_install.cmake ./CMakeFiles/ ./CMakeCache.txt 2> /dev/null
cmake ./CMakeLists.txt
make
./SMI_Enumerator ./control.in

You might also utilize the job schedulers, such as Portable Batch System (PBS) and Simple Linux Utility for Resource Management (Slurm), if available.


You might also utilize the job schedulers (e.g. PBS or Slurm) for both of the programs if available.

4. Developers
This program is developed by Chen-Hsuan Huang and Shiang-Tai Lin.
Computational Molecular Engineering Laboratory
Department of Chemical Engineering, National Taiwan University, Taipei, Taiwan

Correspondent: Shiang-Tai Lin ([email protected])

10 changes: 3 additions & 7 deletions src/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
cmake_minimum_required (VERSION 2.6)


#set(CMAKE_CXX_COMPILER g++)
set(CMAKE_CXX_COMPILER "/home/tom61212/bin/gcc920/bin/g++")
set(CMAKE_C_COMPILER "/home/tom61212/bin/gcc920/bin/gcc")

Expand All @@ -18,14 +17,11 @@ set(filename
main.cpp
)

project(GA)
add_executable(MARS ${filename})
#target_link_libraries(MARS "/home/akitainu/bin/openbabel-install/lib/libopenbabel.so.5")
#include_directories("/home/akitainu/bin/openbabel-install/include/openbabel-2.0")
target_link_libraries(MARS "/home/tom61212/bin/obabel-install/lib/libopenbabel.so.7")
project(MARS-PLUS)
add_executable(MARS-PLUS ${filename})
target_link_libraries(MARS-PLUS "/home/tom61212/bin/obabel-install/lib/libopenbabel.so.7")
include_directories("/home/tom61212/bin/obabel-install/include/openbabel3")
include_directories("/home/tom61212/share/Eigen3/eigen-3.3.7/build/include/eigen3")
#include_directories("/home/akitainu/bin/openbabel-install/lib")

message("Default build type is ${CMAKE_BUILD_TYPE}")

Expand Down
101 changes: 101 additions & 0 deletions src/standalone_SMI_Enum/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
cmake_minimum_required (VERSION 3.5)


#set(D_GLIBCXX_USE_CXX11_ABI 0)

set(filename
SMI_Enumerator.cpp
)

set(exename
SMI_Enumerator
)


project(PROG)

# ENV setting
set(RDBASE "/home/tom61212/share/rdkit_py/rdkit")
set(MYBOOST "/home/tom61212/share/boost_1_75_0_py/build")
set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${RDBASE}/Code/cmake/Modules")


# Compiler setting
# note that if you haven't installed/built the toolkit with CoordGen, you'll have problems with this.
set(CMAKE_CXX_COMPILER "/home/tom61212/bin/gcc920/bin/g++")
set(CMAKE_C_COMPILER "/home/tom61212/bin/gcc920/bin/gcc")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -std=c++11 -DRDK_BUILD_COORDGEN_SUPPORT=ON" )


# Boost settings
#set(MYBOOST "/home/tom61212/share/boost_1_75_0_py/build")
set(Boost_USE_STATIC_LIBS ON)
set(Boost_USE_MULTITHREADED OFF)
set(Boost_USE_STATIC_RUNTIME ON)
include_directories( ${MYBOOST}/include )
link_directories( ${MYBOOST}/lib )

# RDKit setting
# specify where CMake can find the RDKit libraries
include_directories ( ${RDBASE}/Code ) #${CAIRO_INCLUDE_DIRS}
link_directories ( ${RDBASE}/lib )
set(RDKit_LIBS RDKitChemReactions RDKitFileParsers RDKitSmilesParse RDKitDepictor
RDKitRDGeometryLib RDKitRDGeneral RDKitSubstructMatch RDKitSubgraphs
RDKitMolDraw2D RDKitGraphMol RDKitDistGeometry RDKitDistGeomHelpers
RDKitMolAlign RDKitOptimizer RDKitForceField RDKitForceFieldHelpers
RDKitAlignment RDKitForceField RDKitMolTransforms RDKitEigenSolvers
)
find_package (Threads)
set(RDKit_THREAD_LIBS Threads::Threads)


#OBabel setting
#include_directories("/home/akitainu/bin/openbabel-install/include/openbabel-2.0") # .../openbabel-2.0 or .../openbabel3
#link_directories( "/home/akitainu/bin/openbabel-install/lib/libopenbabel.so.5" )
link_libraries( "/home/tom61212/bin/obabel-install/lib/libopenbabel.so.7" )
include_directories("/home/tom61212/bin/obabel-install/include/openbabel3")


#Eigen3 setting
include_directories("/home/tom61212/share/Eigen3/eigen-3.3.7/build/include/eigen3")


#GCC 9.2.0 setting
include_directories("/home/tom61212/bin/gcc920/lib64")


set( LIBS
${RDKIT_LIBRARIES}
${RDKit_THREAD_LIBS}
#"/home/akitainu/bin/openbabel-install/lib/libopenbabel.so.5"
"/home/tom61212/bin/obabel-install/lib/libopenbabel.so.7"
) #${CAIRO_LIBRARIES} Boost::iostreams


set(EXECUTABLE_OUTPUT_PATH ${CMAKE_SOURCE_DIR})
add_executable(${exename} ${filename})
target_link_libraries(${exename} ${LIBS} ${RDKit_LIBS}) #"/home/akitainu/bin/openbabel-install/lib/libopenbabel.so.5"
#set_target_properties(${exename} PROPERTIES
# CXX_STANDARD 11 # C++11...
# CXX_STANDARD_REQUIRED ON #...is required...
# CXX_EXTENSIONS OFF #...without compiler extensions like gnu++11
# )

message("Default build type is ${CMAKE_BUILD_TYPE}")

message("CMAKE_C_FLAGS_DEBUG is ${CMAKE_C_FLAGS_DEBUG}")
message("CMAKE_C_FLAGS_RELEASE is ${CMAKE_C_FLAGS_RELEASE}")
message("CMAKE_C_FLAGS_RELWITHDEBINFO is ${CMAKE_C_FLAGS_RELWITHDEBINFO}")
message("CMAKE_C_FLAGS_MINSIZEREL is ${CMAKE_C_FLAGS_MINSIZEREL}")

message("CMAKE_CXX_FLAGS_DEBUG is ${CMAKE_CXX_FLAGS_DEBUG}")
message("CMAKE_CXX_FLAGS_RELEASE is ${CMAKE_CXX_FLAGS_RELEASE}")
message("CMAKE_CXX_FLAGS_RELWITHDEBINFO is ${CMAKE_CXX_FLAGS_RELWITHDEBINFO}")
message("CMAKE_CXX_FLAGS_MINSIZEREL is ${CMAKE_CXX_FLAGS_MINSIZEREL}")

if (NOT CMAKE_BUILD_TYPE)
message(STATUS "No build type selected, default to Release")
set(CMAKE_BUILD_TYPE "debug")
endif()


54 changes: 54 additions & 0 deletions src/standalone_SMI_Enum/README.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
########## Introduction to standalone SMILES enumerator ##########
The SMILES enumerator can generate many synonymous SMILES from a given SMILES, and has been
found useful for improving NN-based models. [1]
We provided a standalone SMILES enumerator (see src/standalone_SMI_Enum/SMI_Enumerator.cpp)
that can compare the performance differences between Open Babel-based and RDKit-based enumerator.

There are several input files to start the standalone SMILES enumerator: (see src/standalone_SMI_Enum/)
src/standalone_SMI_Enum/control.in : controls the input, output, and calculation options.
src/standalone_SMI_Enum/SMI.txt : the list of input SMILES'.

The calculation results for each of the operations will be outputted as a file (see LOG_FILES/).
For example, the results of bond change operation on an IL will be outputted to LOG_FILES/change_bnd_IL.txt


Refs:
[1] Bjerrum, E. J., SMILES Enumeration as Data Augmentation for Neural Network Modeling of Molecules. 2017.



########## Developers ##########
This program is developed by Chen-Hsuan Huang and Shiang-Tai Lin ([email protected]).
Computational Molecular Engineering Laboratory
Department of Chemical Engineering, National Taiwan University, Taipei, Taiwan



########## Development environment ##########
Linux CentOS 7
g++ compiler from GNU Compiler Collection v9.2.0 (or any compiler supporting C++11)
Open Babel v3.1.0 (compile from source code)
Cmake v3.15.5
Make v4.2
RDKit 2020_03_1 (Q1 2020) Release
Boost v1.75.0



########## Usage ##########
1. Standalone SMILES enumerator
Please read the instructions in src/standalone_SMI_Enum/control.in and src/standalone_SMI_Enum/SMI.txt .
Make sure you have properly set the parameters before starting the standalone SMILES enumerator.

cd src/standalone_SMI_Enum/
rm -r ./cmake_install.cmake ./CMakeFiles/ ./CMakeCache.txt 2> /dev/null
cmake ./CMakeLists.txt
make
./SMI_Enumerator ./control.in



You might also utilize the job schedulers (e.g. PBS or Slurm) for both of the programs if available.



6 changes: 6 additions & 0 deletions src/standalone_SMI_Enum/SMI.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
c1ccccc1c1ccccc1
Cc1ccccc1
C/C=C/[C@H](n1cc[n+](c1)C)F
C/C=C/[C@H](n1cc[n+](c1c1cccc[nH+]1)C)F
C[C@@](C[P@](=O)(OC[C@@H](I)S)[O-])(Cl)O
CC(C)(C)OC(=O)CC(=O)OC
Binary file added src/standalone_SMI_Enum/SMI_Enumerator
Binary file not shown.
Loading

0 comments on commit 749529f

Please sign in to comment.