The main goal of this project is to monitor the trends in the UK data science job market. For a write-up see this Medium article Python vs R: How to Analyse 4000 Job Advertisements Using Shiny & Machine Learning
I originally started this project in 2018 to help me decide whether to learn Python or not. I now use it as motivation to keep learning Python!
Update Jan 2025 - there used to be a web application associated with this project, but my server experienced a hard-drive fault and I haven't had time to repair it yet!
- Web Scraping
- Data Visualization
- Topic Modelling (LDA)
- Web Application Development & Hosting
- Task Scheduling
- R
- Shiny
- Selenium
- Docker
- Linux
- Azure
The data source for this project is the jobserve website. On a schedule (daily) we perform the following
- Scrape all 'Data Scientist' jobs from jobserve
- Pre-process data, produce visualisations and build topic models on the job description
- Present output using an interactive web application
The three distinct tasks each have their own folder
- Scraping
- Analyse
- Shiny
Each task has its own docker image, and is launched on a schedule using cron.
For the Shiny App we use Nginx as a reverse proxy and to encrypt all traffic using SSL. The Nginx folder contains the required config file.
Lastly there are a number of helper shell scripts in the root directory which automate some of the repetitive tasks (docker run, docker compose up etc).
Follow the setup instructions