-
Install Docker according to the instructions on the Docker website
-
Open the terminal and enter:
sudo docker -v
Check if Docker is installed successfully, if not retry the previous step.
-
Unzip
awt-pj-ss22-learn-tech-2.zip
and go to the project directory/awt-pj-ss22-learn-tech-2
Type the following command to build the Docker image from the Dockerfile:
sudo DOCKER_BUILDKIT=1 docker build -t competence-extraction:1.0 ./
All dependencies will be downloaded automatically, which will last for a while.
-
Type the following command and you will see the
competence-extraction
image you just built:sudo docker images
-
Enter the following command to start a container from this image on port 8080 for Jupyter Notebook and port 5000 for RESTful API:
sudo docker run --user root -p 8888:8888 -p 5000:5000 competence-extraction:1.0
You may need to change port if it is ocuppied. For example, using:
sudo docker run --user root -p 8888:8888 -p 5001:5000 competence-extraction:1.0
or
sudo docker run --user root -p 8889:8888 -p 5000:5000 competence-extraction:1.0
-
Use a browser to open the last link given on the terminal so that you can access the source code.
!!! Please save this link for future use !!!
All existing datasets have been processed, so you can skip this step and directly view the results using the RESTful API in the next step. or follow the intructions if you want to run the source code.
-
Use
Ctrl+C
(key combinations on the keyboard) to exit the Jupyter Notebook environment, entersudo docker ps -a
to find the container that just created and already run the Jupyter Notebook, remember the
<CONTAINER ID>
this container. -
Start the container container again:
sudo docker start <CONTAINER ID>
-
Use
sudo docker exec -it <CONTAINER ID> bash
to enter the terminal of the container
-
Enter
sudo python awt-pj-ss22-learn-tech-2/src/app.py
to run the RESTful API, and open the first link according to the information on the terminal in the browser.
-
Use
Ctrl+C
(key combinations on the keyboard) to close the RESTful API. -
You can also use the same link (the one you should save) to start Jupyter Notebook again.
-
Finally, enter
exit
to exit container.
-
Install Anaconda environment first.
-
Download and install Conda env for M1
chmod +x ~/Downloads/Miniforge3-MacOSX-arm64.sh sh ~/Downloads/Miniforge3-MacOSX-arm64.sh source ~/miniforge3/bin/activate
-
Some additional installations are also required:
Spacy:
conda install -c conda-forge spacy python -m spacy download de_core_news_lg
Tensorflow:
Download tensorflow_text
conda install -c apple tensorflow-deps python -m pip install tensorflow-macos python -m pip install tensorflow-metal python -m pip install Downloads/tensorflow_text-2.9.0-cp39-cp39-macosx_11_0_arm64.whl python -m pip install tensorflow_hub
Neo4J:
pip install neo4j
RESTful API:
pip install Flask~=2.1.2 pip install flask-restx==0.5.1 pip install werkzeug==2.1.2
Jupyter Notebook:
conda install -c conda-forge jupyter jupyterlab -y
Pandas:
conda install pandas
-
Use Jupyter Notebook to run source code:
cd Downloads/awt-pj-ss22-learn-tech-2/src jupyter notebook
All existing datasets have been processed, so you can skip this step and directly view the results using the RESTful API in the next step.
In case if you want to test the source code or you have new dataset, there are three different Jupyter Notebooks under the
src/
path. You can find a detailed description of them in AWT_Report_IEEE.pdf. Here are a few points to highlight:- Run each block of code in the Jupyter Notebook in the order of
Preprocessing.ipynb
->NLP.ipynb
->Neo4J.ipynb
, you can see all the intermediate steps. - When importing the libraries, if there is an alarm message, it's just because the GPU is not configured for acceleration, which does not affect normal use.
- By default the control course dataset is used (the input value of the
import_course
function inPreprocessing.ipynb
, it will take more than an hour to use the full course dataset), if you want to test other datasets, please replace the input parameters of theimport_course
function. - In
Neo4J.ipynb
, due to the security settings of the cloud database we use, all local computing data cannot be directly imported into the database. It needs to be uploaded to a publicly accessible HTTP or HTTPS server first. It is recommended to use google drive and create a sharing link. Theget_google_file
function inNeo4J.ipynb
will extract the address of the direct access data file from it and upload it to the cloud database. - You can also use your own cloud database by replacing
uri
user
password
inNeo4J.ipynb
or your own network drive, and upload to the cloud database using a similar method. Evaluation.ipynb
undersrc/archive
is only used as a potential evaluation tool and is not actually used.
- Run each block of code in the Jupyter Notebook in the order of
-
Use
Ctrl+C
(key combinations on the keyboard) to exit the Jupyter Notebook environment, enterflask run --port=5001
to run the RESTful API, and open the first link according to the information on the terminal in the browser.