-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Maybe improve our database #52
Comments
SQL model might be another fit for an index db: https://github.com/tiangolo/sqlmodel |
Database are useful when we have complex queries, like find all the dataset linked to model a which also applies to model b like operations. Right now, it is enough to just go for the s3 files, the summary file are essentially for search from the website, and we have a clear submission, and publishing workflow, each steps are distinctive. So we won't really need a dedicated database. Separate files on s3, or a single file in the database are two different approach, for now I would stick with s3, since it's much easy to make changes to individual files without impacting all the records, while editing database files are much less straight forward, and require more attention in backup the database, migrating the database etc. If you have both s3 files and database, then we are creating two sources of truth. If we end up needing a database, e.g. create a hypha service for advanced model search, I would built the database on the fly from s3 files and use s3 as the truth data source. Plus, we don't really need a dedicated database, since S3 also support SQL syntax for searching over json files. See s3-select: https://docs.aws.amazon.com/AmazonS3/latest/userguide/selecting-content-from-objects.html |
creating an "index database" on the fly was more what I had in mind... but this should be left for future optimization in any case. |
Our database is now a series of files on S3.
So far this is sufficient and the minio (python) client (and our
Client
wrapper around it) allow for convenient access and inspection of our database.We might want to look into more standard approaches, so this issue serves as a place to take notes and discuss this eventually.
https://aws.amazon.com/de/blogs/big-data/building-and-maintaining-an-amazon-s3-metadata-index-without-servers/
The text was updated successfully, but these errors were encountered: