A suite of data mining, analytics, and visualization solutions to create an awesome dashboard for the Museum Barberini, Potsdam, in order to help them analyze and assess customer, advertising, and social media data!
This solution has been originally developed in part of a Data Analytics project run as a cooperation of the Museum Barberini (MB) and the Hasso Plattner Institute (HPI) in 2019/20 (see Credits below). The project comprises a data mining pipeline that is regularly run on a server and feeds several visualization dashboards that are hosted in a Power BI app. For more information, see also the following resources:
- System architecture slides
- Original project description (organizational) (German)
- Original project description (technical) (German)
- Final press release (German)
- Official presentation video (mirror on YouTube)
While this solution has been tailored for the individual needs of the MB and the overall project is characterized by the structure of a majestic monolith, we think that it contains some features and components that have great potential for being reused as part of other solutions. In particular, these features include the following highlights:
-
Gomus binding: Connectors and scrapers for accessing various data sources from the museum management system go~mus. See
src/gomus
and the relevant documentation. -
Apple App Store Reviews binding: Scraper for fetching all user reviews about an app in the Apple App Store. See
src/apple_appstore
and the relevant documentation. -
Visitor Prediction: Machine-Learning (ML) based solution to predict the future number of museum visitors by extrapolating historic visitor data. See
src/visitor_prediction
.Credits go to Georg Tennigkeit (@georgt99).
-
Postal Code Cleansing: Collection of heuristics to correct address information entered by humans with errors. See [
src/_utils/cleanse_data.py
].Credits go to Laura Holz (@lauraholz).
-
Power BI Crash Tests: Load & crash tests for Power BI visualization reports. See https://github.com/LinqLover/pbi-crash-tests.
Credits go to Christoph Thiede (@LinqLover).
Development is currently being continued on GitLab (private repo) but a mirror of the repository is available on GitHub.
If you are interested in reusing any part of our solution and have further questions, ideas, or bug reports, please do not hesitate to contact us!
- UNIX system
Please note that these instructions are optimized for Ubuntu/amd64.
If you use a different configuration, you may need to adjust the toolchain installation (see install_toolchain.sh
).
-
Clone the repository using git
git clone https://github.com/Museum-Barberini/Barberini-Analytics.git
- For best convenience, clone it into
/root/barberini-analytics
.
- For best convenience, clone it into
-
Copy the
secrets
folders (which is not part of the repository) into/etc/barberini-analytics
. From thesecret_files
subdirectory, you may omit files denoted as caches in the documentation. -
Set up the toolchain. See
scripts/setup/install_toolchain.sh
how to do this. If you use ubuntu/amd64, you can run the script directly. Usesudo
to run the commands! -
Set up the docker network and add the current user to the
docker
user group. Do not run this script withsudo
!./scripts/setup/setup_docker.sh
-
Make sure to set the timezone of the machine to match the timezone of the gomus server.
sudo timedatectl set-timezone Europe/Berlin
To use TLS encryption, we recommend using Let's Encrypt and certbot. Installation:
./scripts/setup/setup_letsencrypt.sh
Alternatively, just make sure that in /var/barberini-analytics/db-data
, the following files are present and up to date:
server.crt
server.key
See configuration for more information.
mkdir -p /var/barberini-analytics/db-data
make startup-db
Run scripts/setup/setup_db.sh
.
This has not been tested for a long time!
ssh -C <oldremote> "docker exec barberini_analytics_db pg_dump -U postgres -C barberini | bzip2" | bunzip2 | docker exec -i barberini_analytics_db psql -U postgres
scp <oldremote>:/var/barberini-analytics/db-data/applied_migrations.txt /var/barberini-analytics/db-data/
./scripts/setup/setup_db_config.sh
Run sudo scripts/setup/setup_cron.sh
.
If you cloned the repository in a different folder than /root/barberini-analytics
, you may want to adapt the paths in scripts/setup/.crontab
first.
If no crontab exists before, create it using crontab -e
.
These instructions assume that you want to use a custom GitLab CI runner:
-
Go to the GitLab CI/CD settings of your repository (e.g., https://gitlab.com/Museum-Barberini/Barberini-Analytics/-/settings/ci_cd#js-runners-settings), locate "New project runner"
, and choose "Show runner instalation and registration instructions" from the menu.Follow the instructions.
Follow these instructions instead to install the runner via apt with an update path. See https://gitlab.com/gitlab-org/gitlab/-/issues/424394 for the inconsistency in the docs.Configuration:
- To set up a new runner, use these options:
- executor type:
shell
- executor type:
- To reuse the config of an existing runner, you may need to somehow cancel this dialog and reuse your existing
/etc/gitlab-runner/config.toml
file instead.
Check whether the runner is displayed in the GitLab CI/CD settings.
- To set up a new runner, use these options:
-
Add the gitlab-runner user to the docker group:
sudo usermod -aG docker gitlab-runner
-
Fix shell profile loading: Check whether
/home/gitlab-runner/.bash_logout
tries to clear the console, and if so, comment out the respective line. See https://docs.gitlab.com/runner/shells/index.html#shell-profile-loading for more information. -
Customize the runner config (
/etc/gitlab-runner/config.toml
) depending on your needs. This is what we use:-concurrent = 1 +concurrent = 2 # ... [[runners]] + # WORKAROUND for permission issues. See: https://gitlab.com/gitlab-org/gitlab-runner/-/issues/2221 + # NOTE: Update the string to match the runner user's password. This requires the user to be in the sudo group. + pre_clone_script = "echo <password_here> | sudo -S chown -f gitlab-runner:gitlab-runner -R /home/gitlab-runner/builds"
-
Trigger a pipeline run to check whether the runner works.
See CONFIGURATION.md
.
Run scripts/setup/setup_dev.sh
to set up the development environment.
Have a look at our beautiful Makefile
!
To access the luigi docker, do:
make startup connect
Close the session by executing:
make shutdown
make docker-do do='make luigi-scheduler'
make luigi-frontend
This will also start a webserver on http://localhost:8000 where you can trace all running tasks.
make docker-do do='make luigi'
Or, if you want to run a specific task:
make connect
:
make luigi-task LTASK=<task> LMODULE=<module> [LARGS=<task_args>] [MINIMAL=True]
If you see this error:
To modify production database manually, set BARBERINI_ANALYTICS_CONTEXT to the PRODUCTION constant.
Then you want either to set the BARBERINI_ANALYTICS_CONTEXT
environment variable to PRODUCTION
or to run the task against a test database:
export POSTGRES_DB=barberini_test
make connect
:
make test
./scripts/tests/run_minimal_mining_pipeline.sh
- Windows 10
Download and install Power BI: https://powerbi.microsoft.com/downloads
See DOCUMENTATION.md
.
See MAINTENANCE.md
.
Authors: Laura Holz, Selina Reinhard, Leon Schmidt, Georg Tennigkeit, Christoph Thiede, Tom Wollnik (bachelor project BP-FN1 @ HPI, 2019/20).
Organizations: Hasso Plattner Institute, Potsdam; Museum Barberini; Hasso Plattner Foundation.