-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade script #452
Upgrade script #452
Conversation
Header properties shouldn't have setters. The format and version attributes should only ever be added on creation. A property setter should never be exposed, since it communicates that the user can edit the attributes freely, which they shouldn't since it would break the file header check. The version attribute is not overwritten if it exists when the method is called. This avoids resetting the version by calling the method on an old file with a new library, which can be done since "private" methods aren't really private. The same check is not made for format since it never changes. Removed timestamp updating from the setter methods since it's called in the constructor, which is only run on new file creation.
Instead of setting a flag, duplicate a couple of lines to make the differences between new and existing files clearer.
This file format version change reflects the following format changes: - Property values: no longer compound data types - DataFrameDimension: new dimension type for DataArray - File ID: new header attribute
Only if they're newer than 1.2.0, the version that introduced IDs
This reverts commit 7b3c338.
215f1e6
to
ee494de
Compare
Script to upgrade NIX files to newest file format version. For now just a stub.
This is a reference to the underlying H5Group for the metadata root container. It shouldn't be accessible as 'File.metadata'. The proper top level container for metadata is 'File.sections'.
This is a WIP and still needs to take care of old style Property objects. The script first checks the file header version to see if it's older than the current. If it is, it continues to collect tasks that need to be performed to complete the upgrade. Tasks are created using functions that check if a change is required and if so, they return a closure bound to the specific filename. The returned function is then run after the user confirms they want to apply the changes. If --force is specified, the changes are applied without prompting. All checks and changes are made in standalone functions that open the file using h5py and close it immediately after. This makes each change atomic (or atomic-ish) and reduces the chances that a file will be left in an invalid state. There's still a chance the user will stop the procedure after a change has been applied and before the file version in the header is updated. This will put the file in a sort of invalid state but rerunning the upgrade script should finalise the changes. Regardless of the possibilities of corruption or invalid states, the user is warned to make sure their files and data are backed up.
Extra attributes, if set, become new properties to avoid data loss. Uncertainties are added to the new uncertainty attribute of the property, but only if there is one unique uncertainty for all values in the old property. If it is not unique and each value has its own uncertainty, they are also added as an extra property.
8e9f00d
to
ecc5121
Compare
Forgot to mention: This PR includes #451. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lvgtm! Worked hard to find something to nitpick upon :)
nixio/cmd/upgrade.py
Outdated
def update_property_values(fname): | ||
""" | ||
Returns a closure that binds the filename. When the return value is | ||
called, it modifies rewrites all the metadata Property objects to the new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing "and" like "modifies and rewrites"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OH NO!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
welcome
I had a thought about a potential edge case that might cause issues. If you supply the same file twice, the task collection procedure will put the same file on the list twice. This wont be a problem with the file ID and version update, but the property reformatting might cause issues. Most likely the script will crash when attempting to get the "uncertainty" component of the property's value, so it shouldn't cause any corruption. This will happen regardless of whether the file is specified twice on the command line, or if it's through a symlink or hardlink. |
Tested now and as I suspected it crashes when trying to get the first field of the compound data type in the property. I can imagine this might be annoying if someone is running an upgrade of all their files and they have a symlink to one somewhere. I guess the chances are low in general, but maybe we should guard against it. This isn't an issue if the same file is specified twice on the command line, since the tasks are indexed by the filename and that will get deduplicated. |
If you supply the same file twice (perhaps through a symlink), the task collection procedure will put the same file on the list twice. This wont be a problem with the file ID and version update, but the property reformatting might cause issues.The script will crash when attempting to get the "uncertainty" component of the property's value, so it shouldn't cause any corruption. Regardless, checking the property format again right before performing the change will prevent the issue.
Task preparation functions are responsible for returning the closure or None if it's not needed. The task is not added if it's not needed. Even if it is added, the task checks if it needs to make the chance right before making it (guards against multiple runs on the same file after task collection).
With the latest changes, the script is a bit more careful about what it adds to the tasks and what it performs. It will only add a task if it's needed (previously, the Property update task was added even if the number of Property objects that needed updating was 0) and before performing any change, it rechecks if it's needed. |
nixio/cmd/upgrade.py
Outdated
return update_props | ||
|
||
|
||
def create_property(hfile, name, dtype, data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am afraid the unit got lost during upgrading
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh dear. Good catch!
Same for definition.
I'll add an extra step at the end to offer to run |
This PR is mainly for the upgrade script but also includes some fixes and cleanup.
The script first checks the file header version to see if it's older than the current. If it is, it continues to collect tasks that need to be performed to complete the upgrade.
Tasks are created using functions that check if a change is required and if so, they return a closure bound to the specific filename. The returned function is then run after the user confirms they want to apply the changes. If --force is specified, the changes are applied without prompting.
All checks and changes are made in standalone functions that open the file using h5py and close it immediately after. This makes each change atomic (or atomic-ish) and reduces the chances that a file will be left in an invalid state. There's still a chance the user will stop the
procedure after a change has been applied and before the file version in the header is updated. This will put the file in a sort of invalid state but rerunning the upgrade script should finalise the changes.
Regardless of the possibilities of corruption or invalid states, the user is warned to make sure their files and data are backed up.
If you want to test the script, you can use the files in this GIN repository: https://gin.g-node.org/achilleas/nix-files. This includes two files from the G-Node/nix-examples repository (which come from Jan) and two more I made, one using nixpy v1.4 and the other using v1.5 (current master).