This repository contains a demo for visualising and interacting with datasets in the EIDC based on their metadata descriptions found in the metadata catalogue.
The follow requirments shoud be installed:
- Python 3.12
Setup a new python virtual environment
python -m venv .venv`, `source .venv/bin/activate
Install the required packages from requirements.txt
:
pip install -r requirements.txt
To run all the demos you will need to download the metadata from the EIDC catalogue and load it into a Chroma vector database. A convenience pipeline has been written to handle this task. The pipeline is defined in pipelines/index.yml
and it can be easily run using load_embeddings.py
:
./load_embeddings.py
This script assumes you have activated a python
venv
and install the required dependnecies. This ingestion pipeline will download the metadata avaialble in the EIDC catalgoue, convert and store the metadata in a chroma instance. Setting for defining the file path to the chroma data, the collection to use, and which metadata fields to store is defined inconfig.yml
All demos run using Streamlit. To start the visualisation demo use:
python -m streamlit run visualisation/visualisation_app.py
The demo should automatically open in you browser when you run streamlit. If it does not, connect using: http://localhost:8501.
This application run a retrieval augmented generative pipeline using Haystack, Chroma, FastAPI and a simple user interface using Streamlit. The pipeline is defined in pipelines/llama3.1-rag-pipe.yml
and can be seen below:
The RAG pipeline makes use of the llama3.1
model via Ollama. Check the ollama website for most recent setup guide but for brevity you can follow the following basic instructions:
Download and run the ollama installer shell script:
curl -fsSL https://ollama.com/install.sh | sh
Load the llama3.1 model into ollama and check it runs:
ollama run llama3.1
Use
/bye
to exit the ollama shell. The llama3.1 model should now be available via the ollama rest API at http://localhost:11434
To start the streamlit UI:
python -m streamlit run rag/rag_app.py
The user interface should then be available at http://localhost:8501.
This application performs basic NER (Named Entity Recognition) on an input query using Spacy. Any detected geographic place names (GPE) are automatically geocoded using Nominatim through GeoPy and then displayed to a map using Folium. The NER results are also dispayed using Displacy.
The application simply runs via streamlit:
python -m streamlit run map/map_app.py