Skip to content

Commit

Permalink
Neaten analysis folder notebooks (#62)
Browse files Browse the repository at this point in the history
* Rename and delete notebooks in skills extraction analysis

* Update analysis folder readme

* Add skills taxonomy notebooks

* Add description of skills taxonomy analysis folder to reamde

* Add taxonomy application notebooks and update readme

* Add textkernel sample analysis notebooks

* delete esco skills analysis

* adding to analysis readme

* correct output location for tk analysis
  • Loading branch information
lizgzil authored Sep 23, 2021
1 parent de0c02e commit 5c19a3e
Show file tree
Hide file tree
Showing 23 changed files with 3,886 additions and 1,796 deletions.
53 changes: 53 additions & 0 deletions skills_taxonomy_v2/analysis/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Analysis of skills and the skills taxonomy

This analysis folder contains analysis and experimentation notebooks for 5 themes:
1. `tk_analysis/`
2. `sentence_classifier/`
3. `skills_extraction/`
4. `skills_taxonomy/`
5. `skills_taxonomy_application/`

Outputs (figures and data) from analysis are saved to a corresponding folder in the `outputs/` folder.

## `tk_analysis/`

This folder contains two notebooks:
1. `TextKernel Data.ipynb` - Provides a summary of TextKernel dataset.
2. `TextKernel Data Sample for Skills.ipynb` - Comparison of our sample of TextKernel data to all data.

Outputs are in `outputs/tk_analysis`.

## `sentence_classifier/`

## `skills_extraction/`

In this folder we have two scripts for various bits of analysis and figure plotting after extracting skills:
1. `Effect of sample size.ipynb` - Investigate the effect of sample size of skill sentences and how many words are in the vocab.
2. `Skills Extraction Analysis and Figures.ipynb` - Various analysis and figure generation of the skills extracted. Outputs are in `outputs/skills_extraction/figures/..`

In this folder we also have experimentation notebooks showing 4 approaches for skills extraction approaches, including:
1. Network approach
2. Transformers sentence embeddings approach - no masking
3. Word2vec approach
4. Transformers sentence embeddings approach - masking

The last approach was what we used in the final pipeline (refactored in `skills_taxonomy_v2/pipeline/skills_extraction/`).

## `skills_taxonomy/`

In this folder we perform some analysis on the skills taxonomy created when running `skills_taxonomy_v2/pipeline/skills_taxonomy/build_taxonomy.py`.

1. `Evaluate hierarchy.ipynb` - Evaluate hierarchy based on popular skill groups for job titles. Output csvs stored in `outputs/skills_taxonomy/evaluation/`.
2. `Tranversal Skills.ipynb` - Identify the most and least transversal skills and skill groups. Outputs in `outputs/skills_taxonomy/transversal/`.
3. `Renaming sample of skill groups.ipynb` - Manually creating names for some of the skill groups, outputs `skills_taxonomy_v2/utils/2021.09.06_level_a_rename_dict.json` which is used in other notebooks.
4. `Skills Taxonomy Analysis and Figures.ipynb` - this notebook provides some visualisation and analysis of it. Outputs are stored in `outputs/skills_taxonomy/figures/2021.09.06/`.

## `skills_taxonomy_application/`

This folder contains two notebooks to analyse the skills taxonomy in application to the job location and whether the job was advertise pre or post COVID.

1. `Application - Geography.ipynb` See how different skill groups in the taxonomy relate with location of the job advert. Outputs in `outputs/skills_taxonomy_application/region_application/`.
2. `Application - COVID.ipynb` See how different skill groups in the taxonomy relate with whether the job advert was out pre or post COVID. Outputs in `outputs/skills_taxonomy_application/covid_application`.

This folder also contains the script `locations_to_nuts2.py` to convert longitude and latitude coordinates from the job adverts to NUTS2 regional classifications - this was neccessary for use in `Application - Geography.ipynb` .

This file was deleted.

This file was deleted.

Loading

0 comments on commit 5c19a3e

Please sign in to comment.