Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Re-run some analysis, and investgate and fix tk sample (#78)
* Update filter tk data for new sample * Use full job adverts sample in filter bulk data * use set * New method for filter metadata sample * Append to list not add to dict * Correct comma * New method to get bulk metadata without having problems with duplicate job ids * Add dependence * Save out remainder data * Add script to find the duplicated skill sentences * fix save issue * Add new figures and notebook analysis for new taxonomy * Add extra info about sample size to readmes * Use manual names for level A names in build taxonomy * Add script to find all the tk data with no text field * Get no text and full text counts * Add length diagnostic * Add script to get sentence skill preds for new job adverts and append to results * Improve predict replacemnet * Use chunks of job adverts for predicting extra skill sentences * Add init to sentence classifier folder, needed for metaflow * Add more diagnostic data to get_no_texts_tk_data.py * Add extra data to the tk sample to replace sample found from expired files * Add a print of the figure for the multiskill analysis
- Loading branch information