Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: (pseudo-) unique hash for dataset #384

Closed
MerlinSmiles opened this issue Nov 8, 2016 · 9 comments
Closed

Feat: (pseudo-) unique hash for dataset #384

MerlinSmiles opened this issue Nov 8, 2016 · 9 comments

Comments

@MerlinSmiles
Copy link
Contributor

I'd like to implement a hash for each dataset, that is unique enough for our general applications, and short enough for humans to read :)

Reasons for that:

  • easily identify a dataset by its hash so that it is independent of folder structure and things like that
  • add that (human readable) hash to i.e. a plot or my logbook so that I am always sure that I know where the data was exactly
  • eventually in the far future have a way that I in my logbook have a link qc://notebook/S6N09W65ZL which loads a data-analysis notebook

If we think that is useful we'd need to make sure we can generate smart hashes, where it is unlikely that we have a double version.
We could:

import random
import string
N = 10
hash = ''.join(random.SystemRandom().choice(string.ascii_lowercase + string.digits) for _ in range(N))

I have no idea how unique it is though...

we can of course make a really really long string but thats hard to put in a plot title or something like that.

Any ideas/comments about that?

@peendebak
Copy link
Contributor

What about uuid?

Also: why a hash and not a timestring, perhaps with a letter for the location, e.g. qd20161109003910

The time will make it unique, and is human readable.

@AdriaanRol
Copy link
Contributor

@MerlinSmiles , @peendebak we also use the date_time format as an implicit unique ID. However we sometimes run into the problem that there are multiple files with the same date-timestamp (some experiments generate multiple datasets per second). I guess the uuid solves this by going to a finer resolution.

Another idea (that can be incorporated with the above) would be to use a certain aspect of git-hashes. The git hashes are generally long. However if you provide enough characters (~7) to be unique the software will automatically select the right hash. We could implement the same functionality for the qcodes hashes so that if you specify yymmddhhmm (and no seconds) it will still find it because there are no other datasets corresponding to it.

tl;dr

  • timestamp based hashes 👍
  • Allow reference by partial hash (like git hash)

@MerlinSmiles
Copy link
Contributor Author

@peendebak @eendebakpt @AdriaanRol
Yes I was looking into that uuid too, also apparently it can be shortened whith XORing parts of it and it should still be random. I had a thought to put a long and a short version in the snapshot (to not depend on a correct implementation of a shortening algorithm elsewhere).

I use timestamps currently in file / folder names and it works ok, but I might want to rename, reorganize and do stuff.
I am not very fond of timestamps for uniqueness, if you have many measurement setups and many datasets, there might be collisions.
Also a string based on characters and numbers will be much shorter (and readable) than timestamps.

@AdriaanRol
Copy link
Contributor

The hash does not need to be a part of the filename (that's just what we do now), so you should still be free to reorganize, rename and do stuff :).

I agree that shorter generally means more readable, however I think timestamps are generally way more readable than random hashes.
It would also ensure that the timestamp is always stored, allowing automating some kind of data retrieval system.

@giulioungaretti
Copy link
Contributor

giulioungaretti commented Nov 9, 2016

I'd just use the git way, as we talked before. 🖌️

@MerlinSmiles
Copy link
Contributor Author

Where would the right place be to generate and save that hash?
upon initialization of the dataset?
And then add it to the snapshot when saving? Or an empty file which filename is the hash, for easy finding later?

@alan-geller
Copy link

One thing to be careful about with random hashes is that you need a large hash to avoid the birthday problem. That's why GUIDs/UUIDs are so big, and cryptographic hashes are typically very large.

A timestamp plus a short random string often works well, unless you're generating a very large number of data sets each second.

And supporting substring match like Git makes a lot of sense no matter how we generate the unique IDs.

@AdriaanRol
Copy link
Contributor

unless you're generating a very large number of data sets each second.

@alan-geller Just to make sure no hidden assumptions slip into the framework, we generally do generate multiple datasets per second. I'm not worried about this creating any problems but just to be sure.

@WilliamHPNielsen
Copy link
Contributor

Closing this issue since the desired feature has been added in #1002 (and we no longer develop the old dataset).

bors bot added a commit that referenced this issue Sep 6, 2022
4559: Update towncrier requirement from ~=21.9.0 to ~=22.8.0 r=jenshnielsen a=dependabot[bot]

Updates the requirements on [towncrier](https://github.com/hawkowl/towncrier) to permit the latest version.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/hawkowl/towncrier/releases">towncrier's releases</a>.</em></p>
<blockquote>
<h2>Towncrier 22.8.0</h2>
<h1>towncrier 22.8.0 (2022-08-29)</h1>
<p>No significant changes since the previous release candidate.</p>
<h1>towncrier 22.8.0.rc1 (2022-08-28)</h1>
<h2>Features</h2>
<ul>
<li>
<p>Make the check subcommand succeed for branches that change the news file</p>
<p>This should enable the <code>check</code> subcommand to be used as a CI lint step and
not fail when a pull request only modifies the configured news file (i.e. when
the news file is being assembled for the next release). (<code>[#337](twisted/towncrier#337) &lt;https://github.com/hawkowl/towncrier/issues/337&gt;</code>_)</p>
</li>
<li>
<p>Added support to tables in toml settings, which provides a more intuitive
way to configure custom types. (<code>[#369](twisted/towncrier#369) &lt;https://github.com/hawkowl/towncrier/issues/369&gt;</code>_)</p>
</li>
<li>
<p>The <code>towncrier create</code> command line now has a new <code>-m TEXT</code> argument that is used to define the content of the newly created fragment. (<code>[#374](twisted/towncrier#374) &lt;https://github.com/hawkowl/towncrier/issues/374&gt;</code>_)</p>
</li>
</ul>
<h2>Bugfixes</h2>
<ul>
<li>
<p>The extra newline between the title and rendered content when using <code>--draft</code> is no longer inserted. (<code>[#105](twisted/towncrier#105) &lt;https://github.com/hawkowl/towncrier/issues/105&gt;</code>_)</p>
</li>
<li>
<p>The detection of duplicate release notes was fixed and recording changes of same version is no longer triggered.</p>
<p>Support for having the release notes for each version in a separate file is working again. This is a regression introduced in VERSION 19.9.0rc1. (<code>[#391](twisted/towncrier#391) &lt;https://github.com/hawkowl/towncrier/issues/391&gt;</code>_)</p>
</li>
</ul>
<h2>Improved Documentation</h2>
<ul>
<li>Improve <code>CONTRIBUTING.rst</code> and add PR template. (<code>[#342](twisted/towncrier#342) &lt;https://github.com/hawkowl/towncrier/issues/342&gt;</code>_)</li>
<li>Move docs too the main branch and document custom fragment types. (<code>[#367](twisted/towncrier#367) &lt;https://github.com/hawkowl/towncrier/issues/367&gt;</code>_)</li>
<li>The CLI help messages were updated to contain more information. (<code>[#384](twisted/towncrier#384) &lt;https://github.com/hawkowl/towncrier/issues/384&gt;</code>_)</li>
</ul>
<h2>Deprecations and Removals</h2>
<ul>
<li>Support for all Python versions older than 3.7 has been dropped. (<code>[#378](twisted/towncrier#378) &lt;https://github.com/hawkowl/towncrier/issues/378&gt;</code>_)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/twisted/towncrier/blob/trunk/NEWS.rst">towncrier's changelog</a>.</em></p>
<blockquote>
<h1>towncrier 22.8.0 (2022-08-29)</h1>
<p>No significant changes since the previous release candidate.</p>
<h1>towncrier 22.8.0.rc1 (2022-08-28)</h1>
<h2>Features</h2>
<ul>
<li>
<p>Make the check subcommand succeed for branches that change the news file</p>
<p>This should enable the <code>check</code> subcommand to be used as a CI lint step and
not fail when a pull request only modifies the configured news file (i.e. when
the news file is being assembled for the next release). (<code>[#337](twisted/towncrier#337) &lt;https://github.com/hawkowl/towncrier/issues/337&gt;</code>_)</p>
</li>
<li>
<p>Added support to tables in toml settings, which provides a more intuitive
way to configure custom types. (<code>[#369](twisted/towncrier#369) &lt;https://github.com/hawkowl/towncrier/issues/369&gt;</code>_)</p>
</li>
<li>
<p>The <code>towncrier create</code> command line now has a new <code>-m TEXT</code> argument that is used to define the content of the newly created fragment. (<code>[#374](twisted/towncrier#374) &lt;https://github.com/hawkowl/towncrier/issues/374&gt;</code>_)</p>
</li>
</ul>
<h2>Bugfixes</h2>
<ul>
<li>
<p>The extra newline between the title and rendered content when using <code>--draft</code> is no longer inserted. (<code>[#105](twisted/towncrier#105) &lt;https://github.com/hawkowl/towncrier/issues/105&gt;</code>_)</p>
</li>
<li>
<p>The detection of duplicate release notes was fixed and recording changes of same version is no longer triggered.</p>
<p>Support for having the release notes for each version in a separate file is working again. This is a regression introduced in VERSION 19.9.0rc1. (<code>[#391](twisted/towncrier#391) &lt;https://github.com/hawkowl/towncrier/issues/391&gt;</code>_)</p>
</li>
</ul>
<h2>Improved Documentation</h2>
<ul>
<li>Improve <code>CONTRIBUTING.rst</code> and add PR template. (<code>[#342](twisted/towncrier#342) &lt;https://github.com/hawkowl/towncrier/issues/342&gt;</code>_)</li>
<li>Move docs too the main branch and document custom fragment types. (<code>[#367](twisted/towncrier#367) &lt;https://github.com/hawkowl/towncrier/issues/367&gt;</code>_)</li>
<li>The CLI help messages were updated to contain more information. (<code>[#384](twisted/towncrier#384) &lt;https://github.com/hawkowl/towncrier/issues/384&gt;</code>_)</li>
</ul>
<h2>Deprecations and Removals</h2>
<ul>
<li>Support for all Python versions older than 3.7 has been dropped. (<code>[#378](twisted/towncrier#378) &lt;https://github.com/hawkowl/towncrier/issues/378&gt;</code>_)</li>
</ul>
<h2>Misc</h2>
<ul>
<li><code>[#292](twisted/towncrier#292) &lt;https://github.com/hawkowl/towncrier/issues/292&gt;</code><em>, <code>[#330](twisted/towncrier#330) &lt;https://github.com/hawkowl/towncrier/issues/330&gt;</code></em>, <code>[#366](twisted/towncrier#366) &lt;https://github.com/hawkowl/towncrier/issues/366&gt;</code><em>, <code>[#376](twisted/towncrier#376) &lt;https://github.com/hawkowl/towncrier/issues/376&gt;</code></em>, <code>[#377](twisted/towncrier#377) &lt;https://github.com/hawkowl/towncrier/issues/377&gt;</code><em>, <code>[#380](twisted/towncrier#380) &lt;https://github.com/hawkowl/towncrier/issues/380&gt;</code></em>, <code>[#381](twisted/towncrier#381) &lt;https://github.com/hawkowl/towncrier/issues/381&gt;</code><em>, <code>[#382](twisted/towncrier#382) &lt;https://github.com/hawkowl/towncrier/issues/382&gt;</code></em>, <code>[#383](twisted/towncrier#383) &lt;https://github.com/hawkowl/towncrier/issues/383&gt;</code><em>, <code>[#393](twisted/towncrier#393) &lt;https://github.com/hawkowl/towncrier/issues/393&gt;</code></em>, <code>[#399](twisted/towncrier#399) &lt;https://github.com/hawkowl/towncrier/issues/399&gt;</code><em>, <code>[#402](twisted/towncrier#402) &lt;https://github.com/hawkowl/towncrier/issues/402&gt;</code></em></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="https://github.com/twisted/towncrier/commit/411f2676b18daf9ba1eb656f7dc6c726397c4a9e"><code>411f267</code></a> Update version for final release.</li>
<li><a href="https://github.com/twisted/towncrier/commit/f031dfa94e52e07b6c71e88cd5cd50e3803ad96c"><code>f031dfa</code></a> towncrier build --yes</li>
<li><a href="https://github.com/twisted/towncrier/commit/4c854f58a2074458cd66767aca9f3aaa34d842d8"><code>4c854f5</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/hawkowl/towncrier/issues/402">#402</a> from webknjaz/patch-2</li>
<li><a href="https://github.com/twisted/towncrier/commit/f911402b69960bee358b0c2011d685721afa9454"><code>f911402</code></a> Pin external github actions,</li>
<li><a href="https://github.com/twisted/towncrier/commit/a814202234214182855fe4a5ffb40c7f7aec2b04"><code>a814202</code></a> [pre-commit.ci] auto fixes from pre-commit.com hooks</li>
<li><a href="https://github.com/twisted/towncrier/commit/675219a6deac7c65bb94b72da55059b7077e5ec4"><code>675219a</code></a> Add a dummy change note for PR <a href="https://github-redirect.dependabot.com/hawkowl/towncrier/issues/402">#402</a></li>
<li><a href="https://github.com/twisted/towncrier/commit/b566bfb0dcdcd86fa9d349875eba15e647c06459"><code>b566bfb</code></a> Uninvent the wheel w/ <code>re-actors/alls-green</code> @ GHA</li>
<li><a href="https://github.com/twisted/towncrier/commit/5bf0a4653bb2adf52162550d436f1dc39f6e0446"><code>5bf0a46</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/hawkowl/towncrier/issues/396">#396</a> from twisted/292-more-ci-checks</li>
<li><a href="https://github.com/twisted/towncrier/commit/1bb62219a9e7b0d969292cae56104678b8bea151"><code>1bb6221</code></a> Merge branch 'trunk' into 292-more-ci-checks</li>
<li><a href="https://github.com/twisted/towncrier/commit/07baa2ba4b1159462dcb212395be1de91674698d"><code>07baa2b</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/hawkowl/towncrier/issues/395">#395</a> from twisted/384-cli-help</li>
<li>Additional commits viewable in <a href="https://github.com/hawkowl/towncrier/compare/21.9.0...22.8.0">compare view</a></li>
</ul>
</details>
<br />


You can trigger a rebase of this PR by commenting ``@dependabot` rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)


</details>

Changes needed for this

* Invert logic for single file. 22.8.0 fixed the logic such that single_file=False is correct when you have more than one file
* Pin versions of towncrier and sphinx towncrier to reflect this (so we get the right config)


Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jens H. Nielsen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants