Add database version #930

jmbenlloch · 2025-02-21T17:08:41Z

We have had many issues with database recently and, many times, analysts needed to know exactly which database version was being used for files processed at LSC. They also needed to know whether any changes has happened at the DB server or not.

I have implemented a system to record a log of every change happening in our databases (currently only implemented for DEMOPPDB and NEXT100DB, the detectors in use). It is basically a mirror of our database using Dolt. Whenever any update, delete or insert occurs, a hash and a timestamp will be generated associated to those changes. In this way, we can know for sure which tables and which rows have changed at any point in time.

This is already implemented and is transparent to our usual operations. I think it would be interesting to add the DB version to our local sqlite copy. This PR includes a function to do that, the download script will try to get the (hash, timestamp) of the database. It also includes a function to read the version from the local database. To have complete traceability I think all the files processed should include somewhere in the HDF5 file the corresponding hash. In this way we could know exactly which database was used for each file and there would not be any ambiguity anymore.

Doing all that is quite a bit of work and I am not volunteering at all to do it. If any of you think this is interesting, find a volunteer and get it implemented. I am already giving you all the required tools.

Some examples of what the current code does:

In [5]: db.read_db_version(db.get_db("next100"))
Out[5]: 
                            version        date
0  a6lq9jnaguhhvauqfs0p7ptjrtn1vl72  1740148188

In [7]: db.read_db_version(db.get_db("demopp"))
Out[7]: 
                            version        date
0  idbsfat3k116ihh981f0fa9in787q61g  1740147366

In [8]: db.read_db_version(db.get_db("flex100"))
Database does not have db_version table

Related to this, I am not sure where is the best place to comment this, but I would also like to implement a small change in the process done for calibrations (or any other database-changing procedure). Right now, whomever is updating the database is probably executing a bunch of INSERT SQL statements. That works perfectly, but to have a better use of the database changelog I would prefer it to be done using transactions. The difference is that using transactions, even if there are many changes to some tables, they will be executed atomically and only one hash will be generated.

In the current way, each INSERT would generate a new DB version, so it would be also much more difficult trying to make sense of the changes if we need to investigate any issue sometime in the future.

The proposed change is very simple, it only requires adding one line at the beginning and one at the end. For instance:

START TRANSACTION;
INSERT INTO....
INSERT...
[...]
COMMIT;

So the idea is to include all the inserts together between START TRANSACTION; and COMMIT;. Thus, we would get only one new database version with the PMT calibration instead of 60.

Read database version

eb8d02a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add database version #930

Add database version #930

jmbenlloch commented Feb 21, 2025 •

edited

Loading

Add database version #930

Are you sure you want to change the base?

Add database version #930

Conversation

jmbenlloch commented Feb 21, 2025 • edited Loading

jmbenlloch commented Feb 21, 2025 •

edited

Loading