Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scale up predicting skill sentences #26

Merged
merged 17 commits into from
Jul 28, 2021
Merged

Conversation

lizgzil
Copy link
Contributor

@lizgzil lizgzil commented Jul 19, 2021


Addressing #27
Scaling up predict sentences using sentence classifier.

Previously the sentence classifier code was written in a one-off kind of way. Now it's edited to take job adverts from S3 (or locally) and predict the skills sentences in them, outputting only the skill sentences.

The script predict_sentence_class.py is the main thing to review here. Reading my changes to Sentence Classifier.md will give an idea of how to use this.

@lizgzil lizgzil force-pushed the scale-up-sentence-classifier branch from 4237541 to c8571f9 Compare July 19, 2021 12:22
@lizgzil lizgzil requested a review from india-kerle July 21, 2021 12:35
Copy link
Contributor

@india-kerle india-kerle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks liz! sorry for my delayed response on this.

Looks good I think! For the final scale up, I would put in doc strings etc. etc.

I took a look at the .py files and was able to run python skills_taxonomy_v2/pipeline/sentence_classifier/predict_sentence_class.py --config_path 'skills_taxonomy_v2/config/predict_skill_sentences/2021.07.19.yaml

I had a small issue with directories i.e. when you pass '/inputs/skills_classifier...' - I think it might be worth putting a path directory in a config/base.yaml file.
i.e. TRAINING_PATH: "/inputs/skills_classifier/"
MODEL_PATH: "/outputs/skills_classifier/models/"

then calling the path by:
model_path = str(PROJECT_DIR) + config["MODEL_PATH"]

I also couldn't find the outputs file so had to manually create it for .pkl outputs - not sure if I'm doing something wrong or if it might also be worth having that structure in place so it outputs nicely without manual input.

Let me know if there's anything else you need from me!

@lizgzil
Copy link
Contributor Author

lizgzil commented Jul 28, 2021

Thanks so much @india-kerle ! Really good points which I hadn't thought about. I've added an outputs folder gitkeep in 9238e58 and the PROJECT_DIR was in the folder init file, so I've pulled this from here instead of the base config file in 4c64fb9. Hope those look ok to you?

In terms of docstrings - you are right they are needed, but I've got a feeling that these functions may still be in flux (e.g. I've got a plan to move all the s3 data getters to a separate location, and when you edit the sentence classifier there may be more changes to this script), so if you don't mind I'd prefer to add them in a later PR?

@india-kerle
Copy link
Contributor

@lizgzil looks good to me! doc strings etc. makes sense to include in a later PR :) I think it should be good to merge!

@lizgzil lizgzil merged commit c04e43d into dev Jul 28, 2021
@lizgzil lizgzil deleted the scale-up-sentence-classifier branch July 28, 2021 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants