Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add database version #930

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

jmbenlloch
Copy link
Contributor

@jmbenlloch jmbenlloch commented Feb 21, 2025

We have had many issues with database recently and, many times, analysts needed to know exactly which database version was being used for files processed at LSC. They also needed to know whether any changes has happened at the DB server or not.

I have implemented a system to record a log of every change happening in our databases (currently only implemented for DEMOPPDB and NEXT100DB, the detectors in use). It is basically a mirror of our database using Dolt. Whenever any update, delete or insert occurs, a hash and a timestamp will be generated associated to those changes. In this way, we can know for sure which tables and which rows have changed at any point in time.

This is already implemented and is transparent to our usual operations. I think it would be interesting to add the DB version to our local sqlite copy. This PR includes a function to do that, the download script will try to get the (hash, timestamp) of the database. It also includes a function to read the version from the local database. To have complete traceability I think all the files processed should include somewhere in the HDF5 file the corresponding hash. In this way we could know exactly which database was used for each file and there would not be any ambiguity anymore.

Doing all that is quite a bit of work and I am not volunteering at all to do it. If any of you think this is interesting, find a volunteer and get it implemented. I am already giving you all the required tools.

Some examples of what the current code does:

In [5]: db.read_db_version(db.get_db("next100"))
Out[5]: 
                            version        date
0  a6lq9jnaguhhvauqfs0p7ptjrtn1vl72  1740148188

In [7]: db.read_db_version(db.get_db("demopp"))
Out[7]: 
                            version        date
0  idbsfat3k116ihh981f0fa9in787q61g  1740147366

In [8]: db.read_db_version(db.get_db("flex100"))
Database does not have db_version table

Related to this, I am not sure where is the best place to comment this, but I would also like to implement a small change in the process done for calibrations (or any other database-changing procedure). Right now, whomever is updating the database is probably executing a bunch of INSERT SQL statements. That works perfectly, but to have a better use of the database changelog I would prefer it to be done using transactions. The difference is that using transactions, even if there are many changes to some tables, they will be executed atomically and only one hash will be generated.

In the current way, each INSERT would generate a new DB version, so it would be also much more difficult trying to make sense of the changes if we need to investigate any issue sometime in the future.

The proposed change is very simple, it only requires adding one line at the beginning and one at the end. For instance:

START TRANSACTION;
INSERT INTO....
INSERT...
[...]
COMMIT;

So the idea is to include all the inserts together between START TRANSACTION; and COMMIT;. Thus, we would get only one new database version with the PMT calibration instead of 60.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant