-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scale up predicting skill sentences #26
Conversation
4237541
to
c8571f9
Compare
…ling of job advert data paths etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks liz! sorry for my delayed response on this.
Looks good I think! For the final scale up, I would put in doc strings etc. etc.
I took a look at the .py files and was able to run python skills_taxonomy_v2/pipeline/sentence_classifier/predict_sentence_class.py --config_path 'skills_taxonomy_v2/config/predict_skill_sentences/2021.07.19.yaml
I had a small issue with directories i.e. when you pass '/inputs/skills_classifier...' - I think it might be worth putting a path directory in a config/base.yaml file.
i.e. TRAINING_PATH: "/inputs/skills_classifier/"
MODEL_PATH: "/outputs/skills_classifier/models/"
then calling the path by:
model_path = str(PROJECT_DIR) + config["MODEL_PATH"]
I also couldn't find the outputs file so had to manually create it for .pkl outputs - not sure if I'm doing something wrong or if it might also be worth having that structure in place so it outputs nicely without manual input.
Let me know if there's anything else you need from me!
Thanks so much @india-kerle ! Really good points which I hadn't thought about. I've added an outputs folder gitkeep in 9238e58 and the PROJECT_DIR was in the folder init file, so I've pulled this from here instead of the base config file in 4c64fb9. Hope those look ok to you? In terms of docstrings - you are right they are needed, but I've got a feeling that these functions may still be in flux (e.g. I've got a plan to move all the s3 data getters to a separate location, and when you edit the sentence classifier there may be more changes to this script), so if you don't mind I'd prefer to add them in a later PR? |
@lizgzil looks good to me! doc strings etc. makes sense to include in a later PR :) I think it should be good to merge! |
Addressing #27
Scaling up predict sentences using sentence classifier.
Previously the sentence classifier code was written in a one-off kind of way. Now it's edited to take job adverts from S3 (or locally) and predict the skills sentences in them, outputting only the skill sentences.
The script
predict_sentence_class.py
is the main thing to review here. Reading my changes toSentence Classifier.md
will give an idea of how to use this.