Skip to content

Latest commit

 

History

History
182 lines (141 loc) · 8.14 KB

README.md

File metadata and controls

182 lines (141 loc) · 8.14 KB

SS22 Project Advanced Web Technologies

Learning Technologies - Competence Extraction via ML / NLP

TUB FOKUS

image image

Project Architecture

image

For more details

Installation (Recommended for x86 Platform)

  1. Install Docker according to the instructions on the Docker website

  2. Open the terminal and enter:

    sudo docker -v

    Check if Docker is installed successfully, if not retry the previous step.

  3. Unzip awt-pj-ss22-learn-tech-2.zip and go to the project directory /awt-pj-ss22-learn-tech-2

    Type the following command to build the Docker image from the Dockerfile:

    sudo DOCKER_BUILDKIT=1 docker build -t competence-extraction:1.0 ./

    All dependencies will be downloaded automatically, which will last for a while.

  4. Type the following command and you will see the competence-extraction image you just built:

    sudo docker images
  5. Enter the following command to start a container from this image on port 8080 for Jupyter Notebook and port 5000 for RESTful API:

    sudo docker run --user root -p 8888:8888 -p 5000:5000 competence-extraction:1.0

    You may need to change port if it is ocuppied. For example, using:

    sudo docker run --user root -p 8888:8888 -p 5001:5000 competence-extraction:1.0

    or

    sudo docker run --user root -p 8889:8888 -p 5000:5000 competence-extraction:1.0
  6. Use a browser to open the last link given on the terminal so that you can access the source code.

    !!! Please save this link for future use !!!

    All existing datasets have been processed, so you can skip this step and directly view the results using the RESTful API in the next step. or follow the intructions if you want to run the source code.

  7. Use Ctrl+C (key combinations on the keyboard) to exit the Jupyter Notebook environment, enter

    sudo docker ps -a

    to find the container that just created and already run the Jupyter Notebook, remember the <CONTAINER ID> this container.

  8. Start the container container again:

    sudo docker start <CONTAINER ID>
  9. Use

    sudo docker exec -it <CONTAINER ID> bash 

    to enter the terminal of the container

  10. Enter

    sudo python awt-pj-ss22-learn-tech-2/src/app.py 

    to run the RESTful API, and open the first link according to the information on the terminal in the browser.

  11. Use Ctrl+C (key combinations on the keyboard) to close the RESTful API.

  12. You can also use the same link (the one you should save) to start Jupyter Notebook again.

  13. Finally, enter

    exit

    to exit container.

Installation (Recommended for ARM/M1 Platform)

  1. Install Anaconda environment first.

  2. Download and install Conda env for M1

    chmod +x ~/Downloads/Miniforge3-MacOSX-arm64.sh
    sh ~/Downloads/Miniforge3-MacOSX-arm64.sh
    source ~/miniforge3/bin/activate
  3. Some additional installations are also required:

    Spacy:

    conda install -c conda-forge spacy
    python -m spacy download de_core_news_lg

    Tensorflow:

    Download tensorflow_text

    conda install -c apple tensorflow-deps
    python -m pip install tensorflow-macos
    python -m pip install tensorflow-metal
    python -m pip install Downloads/tensorflow_text-2.9.0-cp39-cp39-macosx_11_0_arm64.whl
    python -m pip install tensorflow_hub

    Neo4J:

    pip install neo4j

    RESTful API:

    pip install Flask~=2.1.2
    pip install flask-restx==0.5.1
    pip install werkzeug==2.1.2

    Jupyter Notebook:

    conda install -c conda-forge jupyter jupyterlab -y

    Pandas:

    conda install pandas
  4. Use Jupyter Notebook to run source code:

    cd Downloads/awt-pj-ss22-learn-tech-2/src
    jupyter notebook

    All existing datasets have been processed, so you can skip this step and directly view the results using the RESTful API in the next step.

    In case if you want to test the source code or you have new dataset, there are three different Jupyter Notebooks under the src/ path. You can find a detailed description of them in AWT_Report_IEEE.pdf. Here are a few points to highlight:

    • Run each block of code in the Jupyter Notebook in the order of Preprocessing.ipynb -> NLP.ipynb -> Neo4J.ipynb, you can see all the intermediate steps.
    • When importing the libraries, if there is an alarm message, it's just because the GPU is not configured for acceleration, which does not affect normal use.
    • By default the control course dataset is used (the input value of the import_course function in Preprocessing.ipynb, it will take more than an hour to use the full course dataset), if you want to test other datasets, please replace the input parameters of the import_course function.
    • In Neo4J.ipynb, due to the security settings of the cloud database we use, all local computing data cannot be directly imported into the database. It needs to be uploaded to a publicly accessible HTTP or HTTPS server first. It is recommended to use google drive and create a sharing link. The get_google_file function in Neo4J.ipynb will extract the address of the direct access data file from it and upload it to the cloud database.
    • You can also use your own cloud database by replacing uri user password in Neo4J.ipynb or your own network drive, and upload to the cloud database using a similar method.
    • Evaluation.ipynb under src/archive is only used as a potential evaluation tool and is not actually used.
  5. Use Ctrl+C (key combinations on the keyboard) to exit the Jupyter Notebook environment, enter

    flask run --port=5001

    to run the RESTful API, and open the first link according to the information on the terminal in the browser.

Core Components:

Helpful Links: