You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
thanks for SFM, it is really a great framework :-)
We are currently testing the framework for the BeSocial project in Belgium and came across some unwanted behavior in our use case. We found a solution by changing parts of SFM in our forks, e.g. https://github.com/SvenLieber/sfm-ui
The issue of a central data storage
For some legal and data management reasons the machine running the harvests is different from the server which should store the collected data. Our first naive solution was to mount the destination folder via SSHFS and use it for /sfm-data.
Regular reading and writing perfectly works, however, both Postgres and RabbitMQ literally want to own their respective subdirectories using the chown command which results in permission errors during startup.
A more fine-grained solution
What fixed the issue for us was splitting up the use of /sfm-data into a more fine-granular use based on subdirectories following the SFM directory structure. Thus we can outsource sensitive parts, e.g. having collection sets on a mounted SSHFS drive and using a remote PostgreSQL database.
We adapted SFM to internally use /sfm-db-data, /sfm-mq-data, /sfm-export-data, /sfm-containers-data and /sfm-collection-set-data instead of /sfm-data. These folders are treated just as root directories and sub directories are still created, e.g. /sfm-collection-set-data/collection_set. This still allows to have everything in a single folder following the default of SFM.
Excerpt from .env:
# RabbitMQ is stored locally
DATA_VOLUME_MQ=/sfm-mq-data
# DB is set to a local docker volume, but we do not have a db instance
# we connect to a remote server via POSTGRES_HOST
DATA_VOLUME_DB=/sfm-db-data
# Data from SFM are stored on a remote server
DATA_VOLUME_EXPORT=/mnt/ssh-drive:/sfm-export-data
DATA_VOLUME_CONTAINERS=/mnt/ssh-drive:/sfm-containers-data
DATA_VOLUME_COLLECTION_SET=/mnt/ssh-drive:/sfm-collection-set-data
This certainly looks more confusing compared to having a single /sfm-data volume, but it offers also more flexibility. The changes affect all SFM repositories as they all use /sfm-data.
This is also not the end of the story, what possibly still needs an update are notifications in the monitoring of used data as now the DB directory might not be available anymore.
Another issue might be the database connection to a remote host which should be configured via SSL. I hope this is possible somewhere over here or here.
Please let us know what you think about this solution :-)
Sven
The text was updated successfully, but these errors were encountered:
Sven,
Thanks so much for this detailed description of how SFM could be configured better for an environment such as yours. The team thinks it's a great idea to split out the sfm-data configuration as you've proposed. I imagine you're not the only institution with these requirements. We're planning to put in some time on SFM in late Feb/early March and we'll include this issue in that work. Please feel free to submit a PR if you go ahead with adjusting the notifications or SSL database connection in the meantime.
Laura
Hi,
thanks for SFM, it is really a great framework :-)
We are currently testing the framework for the BeSocial project in Belgium and came across some unwanted behavior in our use case. We found a solution by changing parts of SFM in our forks, e.g. https://github.com/SvenLieber/sfm-ui
The issue of a central data storage
For some legal and data management reasons the machine running the harvests is different from the server which should store the collected data. Our first naive solution was to mount the destination folder via
SSHFS
and use it for/sfm-data
.Regular reading and writing perfectly works, however, both Postgres and RabbitMQ literally want to own their respective subdirectories using the
chown
command which results inpermission errors
during startup.A more fine-grained solution
What fixed the issue for us was splitting up the use of
/sfm-data
into a more fine-granular use based on subdirectories following the SFM directory structure. Thus we can outsource sensitive parts, e.g. having collection sets on a mountedSSHFS
drive and using a remote PostgreSQL database.We adapted SFM to internally use
/sfm-db-data
,/sfm-mq-data
,/sfm-export-data
,/sfm-containers-data
and/sfm-collection-set-data
instead of/sfm-data
. These folders are treated just as root directories and sub directories are still created, e.g./sfm-collection-set-data/collection_set
. This still allows to have everything in a single folder following the default of SFM.Excerpt from
.env
:Excerpt from the
docker-compose.yml
file:This certainly looks more confusing compared to having a single
/sfm-data
volume, but it offers also more flexibility. The changes affect all SFM repositories as they all use/sfm-data
.This is also not the end of the story, what possibly still needs an update are notifications in the monitoring of used data as now the DB directory might not be available anymore.
Another issue might be the database connection to a remote host which should be configured via SSL. I hope this is possible somewhere over here or here.
Please let us know what you think about this solution :-)
Sven
The text was updated successfully, but these errors were encountered: