SS22 Project Advanced Web Technologies

Learning Technologies - Competence Extraction via ML / NLP

Project Architecture

Installation (Recommended for `x86` Platform)

Install Docker according to the instructions on the Docker website
Open the terminal and enter:
```
sudo docker -v
```
Check if Docker is installed successfully, if not retry the previous step.
Unzip awt-pj-ss22-learn-tech-2.zip and go to the project directory /awt-pj-ss22-learn-tech-2

Type the following command to build the Docker image from the Dockerfile:
```
sudo DOCKER_BUILDKIT=1 docker build -t competence-extraction:1.0 ./
```
All dependencies will be downloaded automatically, which will last for a while.
Type the following command and you will see the competence-extraction image you just built:
```
sudo docker images
```

Enter the following command to start a container from this image on port 8080 for Jupyter Notebook and port 5000 for RESTful API:

sudo docker run --user root -p 8888:8888 -p 5000:5000 competence-extraction:1.0

You may need to change port if it is ocuppied. For example, using:

sudo docker run --user root -p 8888:8888 -p 5001:5000 competence-extraction:1.0

or

sudo docker run --user root -p 8889:8888 -p 5000:5000 competence-extraction:1.0

Use a browser to open the last link given on the terminal so that you can access the source code.

!!! Please save this link for future use !!!

All existing datasets have been processed, so you can skip this step and directly view the results using the RESTful API in the next step. or follow the intructions if you want to run the source code.
Use Ctrl+C (key combinations on the keyboard) to exit the Jupyter Notebook environment, enter
```
sudo docker ps -a
```
to find the container that just created and already run the Jupyter Notebook, remember the <CONTAINER ID> this container.
Start the container container again:
```
sudo docker start <CONTAINER ID>
```

Use

sudo docker exec -it <CONTAINER ID> bash

to enter the terminal of the container

Enter
```
sudo python awt-pj-ss22-learn-tech-2/src/app.py 
```
to run the RESTful API, and open the first link according to the information on the terminal in the browser.
Use Ctrl+C (key combinations on the keyboard) to close the RESTful API.
You can also use the same link (the one you should save) to start Jupyter Notebook again.
Finally, enter
```
exit
```
to exit container.

Installation (Recommended for `ARM/M1` Platform)

Install Anaconda environment first.

Download and install Conda env for M1

chmod +x ~/Downloads/Miniforge3-MacOSX-arm64.sh
sh ~/Downloads/Miniforge3-MacOSX-arm64.sh
source ~/miniforge3/bin/activate

Some additional installations are also required:

Spacy:

conda install -c conda-forge spacy
python -m spacy download de_core_news_lg

Tensorflow:

Download tensorflow_text

conda install -c apple tensorflow-deps
python -m pip install tensorflow-macos
python -m pip install tensorflow-metal
python -m pip install Downloads/tensorflow_text-2.9.0-cp39-cp39-macosx_11_0_arm64.whl
python -m pip install tensorflow_hub

Neo4J:

pip install neo4j

RESTful API:

pip install Flask~=2.1.2
pip install flask-restx==0.5.1
pip install werkzeug==2.1.2

Jupyter Notebook:

conda install -c conda-forge jupyter jupyterlab -y

Pandas:

conda install pandas

Use Jupyter Notebook to run source code:
```
cd Downloads/awt-pj-ss22-learn-tech-2/src
jupyter notebook
```
All existing datasets have been processed, so you can skip this step and directly view the results using the RESTful API in the next step.

In case if you want to test the source code or you have new dataset, there are three different Jupyter Notebooks under the src/ path. You can find a detailed description of them in AWT_Report_IEEE.pdf. Here are a few points to highlight:
- Run each block of code in the Jupyter Notebook in the order of Preprocessing.ipynb -> NLP.ipynb -> Neo4J.ipynb, you can see all the intermediate steps.
- When importing the libraries, if there is an alarm message, it's just because the GPU is not configured for acceleration, which does not affect normal use.
- By default the control course dataset is used (the input value of the import_course function in Preprocessing.ipynb, it will take more than an hour to use the full course dataset), if you want to test other datasets, please replace the input parameters of the import_course function.
- In Neo4J.ipynb, due to the security settings of the cloud database we use, all local computing data cannot be directly imported into the database. It needs to be uploaded to a publicly accessible HTTP or HTTPS server first. It is recommended to use google drive and create a sharing link. The get_google_file function in Neo4J.ipynb will extract the address of the direct access data file from it and upload it to the cloud database.
- You can also use your own cloud database by replacing uri user password in Neo4J.ipynb or your own network drive, and upload to the cloud database using a similar method.
- Evaluation.ipynb under src/archive is only used as a potential evaluation tool and is not actually used.
Use Ctrl+C (key combinations on the keyboard) to exit the Jupyter Notebook environment, enter
```
flask run --port=5001
```
to run the RESTful API, and open the first link according to the information on the terminal in the browser.

Core Components:

Data Preprocessing
Competence Extraction (NLP)
Database Access
RESTful API

Helpful Links:

Miroboard
EU-ESCO (Multilingual classification of European Skills, Competences)
Weiterbildungsdatenbank Berlin/Brandenburg search portal
Download link for course description
First Presentation
Second Presentation
Third Presentation
Content Documentation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

SS22 Project Advanced Web Technologies

Learning Technologies - Competence Extraction via ML / NLP

Project Architecture

Installation (Recommended for `x86` Platform)

Installation (Recommended for `ARM/M1` Platform)

Core Components:

Helpful Links:

Files

README.md

Latest commit

History

README.md

File metadata and controls

SS22 Project Advanced Web Technologies

Learning Technologies - Competence Extraction via ML / NLP

Project Architecture

Installation (Recommended for x86 Platform)

Installation (Recommended for ARM/M1 Platform)

Core Components:

Helpful Links:

Installation (Recommended for `x86` Platform)

Installation (Recommended for `ARM/M1` Platform)