GitHub - PabloLeon23/Aprendizaje-automatico

Machine Learning class work:

First we convert de xml file with the traffic incidences in Bizkaia to a csv file.
Script: Sesion1/xml_to_csv.py
Input: Data/IncidenciasTDTHist.xml
Output: Data/IncidenciasTDTGeo.csv
We select the accidents incidences of the csv with all the incidences.
Script: Sesion2/extract_accidents.py
Input: Data/IncidenciasTDTGeo.csv
Output: Data/Accidents.csv
Next, we apply the DBScan algorithm for do clusters with the diferents accidents. Each cluster defines a zone of accidents.
Script: Sesion2/DBSCAN_accidents.py
Input: Data/Accidents.csv
Output: Data/Accidents_with_zones_dbscan.csv
We apply the Spectral and K-means algorithms to accidents.csv. Each cluster defines a zone of accidents. We not consider the Spectral results because it haven't sense. Script: Sesion3/Spectral-kmeans.py
Input: Data/Accidents.csv
Output: Data/Accidents_with_zones_kmeans.csv
We define different zones based on the features of the accidents that has ocurred in that zone.
Script: Sesion4/extract_features_from_zones.py
Input: Data/Accidents_with_zones_dbscan.csv, Data/Accidents_with_zones_kmeans.csv
Output: Data/Zonas_dbscan.csv, Data/Zonas_kmeans.csv
We apply the PCA algorithm to the zones and do hierarchical clustering with the results. Next we define zone groups with their features.
Script: Sesion4/PCA_hierarchical.py
Input: Data/Zonas_dbscan.csv
Output: Data/Grupos_zonas.csv
We modify the script of Step 2 for filter the accidents better. So, we do all a second time with the accidents filtered.
We select the works of the initial incidences csv file for predict its zones. Scripts: Sesion5/extract_works.py
Input: Data/IncidenciasTDTGeo.csv
Output: Data/Works.csv
We create a KNN model trained with the accidents ant its zones for predict the zone of the works. Script: Sesion5/KNN-works.py
Input: Data/Works.csv
Output: Works_zones.csv
We predict the cluster group of each zone with a decision tree and extract the feature relevance. Because the implementation of decision trees has a random factor, we use Random Forest several times in order to improve the acuraccy of the feature relevance. Script: Sesion6/DecisionTree.py
Script: Sesion6/RandomForest.py
Input: Data/Zones_labels.csv
Next, we filter the works in 2007 for remove the repeated works. Script: Sesion7/FilterWorks.py
Input: Data/Works2007.csv
Output: Data/Works2007_filtered.csv
Next, we use the zones and the works of 2007 for create a prediction model with Decision Tree algorithm for predict the number of works in each zone. Script: Sesion7/WorksPrediction.py
Input: /Data/Works2007_filtered.csv
Output: /Data/Zones_with_number_works.csv, /Data/Zones_with_discrete_works.csv

Each Script hava a Jupyter Notebook with comments.

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
Activity 12		Activity 12
Activity 7		Activity 7
Activity 9		Activity 9
Data		Data
Figures		Figures
Sesion1		Sesion1
Sesion2		Sesion2
Sesion3		Sesion3
Sesion4		Sesion4
Sesion5		Sesion5
Sesion6		Sesion6
Sesion7		Sesion7
.gitignore		.gitignore
Readme.md		Readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Contributors 2

Languages

PabloLeon23/Aprendizaje-automatico

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages