SEED: Software for the Extraction of Equations from Data
SEED is a software written in Python that allows for the extraction of governing differential equations from data. We have collated various algorithms written for research purposes into one overall toolbox and provided a GUI for their ease of use. Currently, there are two different algorithms integrated into SEED:
- The MATLAB SINDy codebase written by the Kutz Research Group to accompany their papers Discovering governing equations from data by sparse identification of nonlinear dynamical systems and Data-driven discovery of partial differential equations
- PySINDy, written by Yuying Liu along with Yi Chu and Lianzong Wang, following on from the research of the Kutz Research group and other SINDy research
Both algorithms use the Sparse Identification of Nonlinear Dynamics (SINDy) method, although we are aware that there are others, and provide different implementations of it.
We have edited the examples provided with each algorithm in order to integrate them into SEED, but they otherwise remain unedited.
We have also adapted each algorithm to give SEED the ability to import a users' own data, therefore enabling the analysis of further real-world datasets.
SEED has a simple and intuitive Graphical User Interface (GUI) so that researchers in a wide variety of fields, without needing to know any programming, can analyse their data using cutting edge methods.
SEED was also programmed in a way as to allow for the easy expansion of its capabilities, enabling users with a knowledge of programming to expand upon and improve the software.
In order to run SEED, you must have Python 3.6 or 3.7 installed on your computer. If it is not, it can be downloaded and installed from the Python website. If you are a windows user, ensure you check the box "Add Python 3.7 to Path" on the first page of the installation.
In order to run the MATLAB examples written by the Kutz Research Group, you need to install MATLAB. This is not required to run SEED however, as MATLAB has to be purchased.
If you have MATLAB installed and would like to run the MATLAB examples, you must the MATLAB Engine for Python for the same version of Python you are using on your computer. This can be done by following the instructions on the MATLAB website.
After downloading the source code from GitHub, save all of the files in the same folder anywhere you would like. This allows the programme to find the correct file path to run the examples.
Before running SEED, it is vital to install the Python packages needed for the programme to run. You can do this by running these commands in the terminal or command line (If you intend to run SEED through Jupyter Notebook, see the next paragraph!):
- Mac:
python3 -m pip install --user numpy scipy matplotlib pysindy findiff pytest pylint sphinx
- Windows:
python -m pip install --user numpy scipy matplotlib pysindy findiff pytest pylint sphinx
You can also use SEED through Jupyter Notebook. A .ipynb file is included as well as the .py file. The code is the same. Before running SEED in Jupyter Notebook, it is vital to install a few Python packages needed for the programme to run. You can do this by running these commands in the terminal or command line:
python -m pip install --user numpy scipy matplotlib pysindy findiff pytest pylint sphinx
To run SEED, open the Python IDLE (included with the Python download) and open the file SEED.py. Click Run > Run Module on the toolbar to run the software. If running SEED through Jupyter Notebook, open SEED.ipynb in a Jupyter Notebook server and run all lines of code.
The GUI will start up and will look like this:
- Mac:
- Windows:
The algorithms that have been integrated into SEED come with their own set of examples that were provided with the original research. We have edited the examples to allow for their integration into SEED, but they otherwise remain unedited.
The data needed to run the third PySINDy example was too large to upload to GitHub. The generation script, reaction_diffusion.m, is included in the Algorithms > pySINDy > datasets directory. MATLAB is needed to run the script and generate the reaction_diffusion.mat data file.
After you run an example, the output will look like this:
- Mac:
- Windows:
To understand how to interpret the output fully, consult the algorithm's documentation linked in the Introduction.
In order to use your own data with the algorithms, you must save the data as a .csv file with one column of time series data, and up to three further columns containing the data for each recorded variable. The first row of your .csv file must be the names of each variable.
An example is shown below:
You must then save your .csv file in the SEED > Data folder in order to be found by the programme. There is an example of a data file, 3d_data.csv, in the Data folder previously mentioned.
As mentioned in the introduction, you can add your own algorithms to SEED if you wish. Currently only algorithms in Python and MATLAB are supported. There are four things that are important when editing your algorithm to add to SEED, these are:
- Store the code files in a specific folder
There is a folder in the SEED source files called Algorithms. All code for the new algorithm must be stored here, in a new folder with the same name as your algorithm.
- Example folder
Within your new algorithm folder, any executable files, in Python or MATLAB, that you would like to run through the GUI must be in a sub-folder named examples.
-
Function names and variables
-
Python
Each file in your examples folder must be an execuatable .py file, the name of which will be displayed in the Examples/Own Data dropdown box on the GUI, containing two functions. The first called example with no inputs and two outputs. The outputs are the output coefficients and descriptors of the algorithm. The second function is called get_params and has no inputs and two outputs. The outputs are both lists of the inbuilt variables and their values. These can remain blank if there is no information to print to the GUI.
-
MATLAB
Each file in your examples folder must be an execuatable .m file, the name of which will be displayed in the Examples/Own Data dropdown box on the GUI, containing three functions. The first function must have the same name as the file name. The function should be the same as one of the examples included with SEED and contains if statement detecting how many inputs the function has. This function can be coppied from one of the inbuilt examples, but make sure to change its name. The second function is called run and has no inputs and two outputs. The outputs are the output coefficients and descriptors of the algorithm. The third function is called get_params and has no inputs and two outputs. The outputs are both 1D Cell Arrays of the inbuilt variables and their values. These can remain blank if there is no information to print to the GUI.
-
Below is a table containing a summary of the function inputs and outputs. Look at the code for the inbuilt example algorithms to see examples of their implementations.
Language | Function | Inputs | Contents | Outputs | Contents | Object Type |
---|---|---|---|---|---|---|
Python | example | N/A | x | Output Coefficients | 2D Numpy Array | |
y | Output Descriptors | List | ||||
Python | get_params | N/A | variables | Inbuilt Variable Names | List | |
values | Inbuilt Variable Values | List | ||||
MATLAB | example_name | N/A | To run run | vals | Values | Dependant |
1 | To run get_params | vars | Variable Names | List | ||
MATLAB | run | N/A | vals | Output Coefficients | 2D Array | |
vars | Output Descriptors | List | ||||
MATLAB | get_params | N/A | variables | Inbuilt Variable Names | 1D Cell Array | |
values | Inbuilt Variable Values | 1D Cell Array |
- Any other files
All other files needed for your code to work should be stored in your algorithms main folder.
The MIT License is used for this software. For more information see: License info