The URL of this Repository can be found URL. The repository consist of a data, notebook and script directory. The
The best track and SHIPS data can be retrieved from NCEI-NOAA and RAMMB2-CIRA respectively. A compressed verion of the data can be found here. The metadata information for the best track and SHIPS data can be found at IBTRacs and SHIPS
The data can be converted to a .csv file using the Notebook.
Tropical Cyclones are natural disaster that cause severe imapcts to coastal communities. At the coast these impacts are manifested as extreme rain, wind and storm surge. It is important to provide these commuties with accurate storm tracks and intensity (peak winds and minimum sea level pressure) before these storms make landfall. With this in mind the objectives of this project are to :
- develop a forecast model to predict Tropical Cyclone Tracks up to 24 hr.
- Forecsat peak wind speeds up to 24 hrs ahead.
The repository consist of 4 notebooks. The notebooks performs the following function:
-
Download_Data.ipynb, Retrives the raw data from raw data folder
-
clean_ibtracs_data.ipynb, cleans up the best track data and produces the cleaned_best_track_data.csv file.
-
clean_SHIPS_data.ipynb, cleans up the SHIPS data and produces the cleaned_SHIPS_data.csv file.
-
Prepare_AI_Ready_Data.ipynb, creates the AI Ready Data a csv file ai_ready_SHIPS_data.csv file.
-
EDA.ipynb, explores the distribution and correlation of features in the AI Ready Data.
-
Dimentionality_Reduction.ipynb, explores the distribution and correlation of features in the AI Ready Data.
-
Cluster_Analysis.ipynb, using cluster analysis to idenify storm genesis location in the Atlantic Ocean.
-
Model_Training_Assessment.ipynb, using cluster analysis to idenify storm genesis location in the Atlantic Ocean.
-
Computation_Time_Analysis.ipynb, comparing trade off between time taken to run the model and the accuracy.
-
Model_Benchmarking_Against_CML.ipynb, comparing classic and deep learning.
-
Model_Architecture_Exploration.ipynb, explores different deep learning architecture to solve the issue above.
git clone https://github.com/UW-MLGEO/MLGEO2024_TC_Tracks_Intensity.git
conda env create -f environment.yml
conda activate mlgeo_dataset
- Running
conda env create -f environment.yml
should create the environment and install the required modules. - Select the
mlgeo_dataset
environment
To run the notebook:
- Import the appropriate libraries
- Run the Reading and Extracting Data code block, which will extract the SHIPS and IBTracs data.
- The data will be downloaded to your current directory. Move the data file to the
data\raw
directory.
- Import the appropriate libraries
- Run the Reading Data code block to open the
lsdiaga_1982_2022_sat_ts_5day.txt
file. - Execute the next code block, which consists of a series of functions to help extract the data of interest.
- Run the next code block to create empty arrays for storing the data of interest.
- Extract the data using the functions from the Function code block.
- Execute the Clean Up Data code block which removes outliers, converts units, and invalid entries to
NaNs
- Create the dataframe using the Create Dataframe code block. This code block also ensures that the data is cleaned appropriaely
- Import the appropriate libraries
- Set File PATHS and open file with the first code block
- The notebook provides a list of all the features in the data
- Execute the next code block to select the most meaningful data
- Import the appropriate libraries
- Ensure file paths are set apprpriately
- Investigate basic statistics of the data (min, max, mead, stdev) using the 4th code block
- Execute the next code block to visualize the distribution of the features
- Investigate the correlation of features by executing the final two code blocks
- Import the appropriate libraries
- Ensure file paths are set apprpriately
- Visualize the firstfew columns
- Remove unwanted data with the next code block
- Perform PCA analysis with the next 3 code block
- The first investigates all the data and the remaining two is specific to storm intensity and TC tracking
- Perform the t-Distributed Stochastic Neighbor Embedding (t-Distributed Stochastic Neighbor) with the remaining script ()