Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We have had many issues with database recently and, many times, analysts needed to know exactly which database version was being used for files processed at LSC. They also needed to know whether any changes has happened at the DB server or not.
I have implemented a system to record a log of every change happening in our databases (currently only implemented for
DEMOPPDB
andNEXT100DB
, the detectors in use). It is basically a mirror of our database using Dolt. Whenever any update, delete or insert occurs, a hash and a timestamp will be generated associated to those changes. In this way, we can know for sure which tables and which rows have changed at any point in time.This is already implemented and is transparent to our usual operations. I think it would be interesting to add the DB version to our local sqlite copy. This PR includes a function to do that, the download script will try to get the (hash, timestamp) of the database. It also includes a function to read the version from the local database. To have complete traceability I think all the files processed should include somewhere in the HDF5 file the corresponding hash. In this way we could know exactly which database was used for each file and there would not be any ambiguity anymore.
Doing all that is quite a bit of work and I am not volunteering at all to do it. If any of you think this is interesting, find a volunteer and get it implemented. I am already giving you all the required tools.
Some examples of what the current code does:
Related to this, I am not sure where is the best place to comment this, but I would also like to implement a small change in the process done for calibrations (or any other database-changing procedure). Right now, whomever is updating the database is probably executing a bunch of
INSERT
SQL statements. That works perfectly, but to have a better use of the database changelog I would prefer it to be done using transactions. The difference is that using transactions, even if there are many changes to some tables, they will be executed atomically and only one hash will be generated.In the current way, each
INSERT
would generate a new DB version, so it would be also much more difficult trying to make sense of the changes if we need to investigate any issue sometime in the future.The proposed change is very simple, it only requires adding one line at the beginning and one at the end. For instance:
So the idea is to include all the inserts together between
START TRANSACTION;
andCOMMIT;
. Thus, we would get only one new database version with the PMT calibration instead of 60.