Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding source name to the Specify Collection object table #461

Closed
jlegind opened this issue Dec 21, 2023 · 10 comments
Closed

Adding source name to the Specify Collection object table #461

jlegind opened this issue Dec 21, 2023 · 10 comments
Assignees
Labels
backend backend persistence Database post-processing Specimen table export preparation for import using OpenRefine requires-discussion Specify Related to (interactions with) the Specify SW system

Comments

@jlegind
Copy link
Contributor

jlegind commented Dec 21, 2023

What is the issue ?

It would help in debugging and 'housekeeping' if the imported Digi app records had their original source file name attached in a separate field:

Source
"NHMD_PinnedInsects_20231121_16_16_SS_original.csv"

Detailed description of the issue.

If there is a discrepancy between imported records in Specify and what is in the 4.Archive directory, then having the source path would be a massive help.

Why is it needed/relevant ?

We gain a certain amount of future proofing in that it addresses issues like the one above and anticipates unforeseen problems.

Give scenario(s) of why and when this could be relevant.

If a curator discovers something in specify that is a little off the mark, we can go all the way back to the source to investigate. We have already agreed that the postprocessing GREL scripts should have their own version as they evolves with business needs.
Adding a source field ties neatly into this as it makes forensics much easier.

Estimate level of effort required.

easy

What could be the challenges ?

There does not seem to be a way to automatically add the file name to a column in open refine. That means it has be added manually in the open refine interface which is a trivial task.

What documentation required?

The documentation file "import_protocol_postProcessing.md" will need to be updated.

@jlegind jlegind added persistence Database Specify Related to (interactions with) the Specify SW system post-processing Specimen table export preparation for import using OpenRefine requires-discussion labels Dec 21, 2023
@PipBrewer
Copy link
Collaborator

It would be good if we could see this in Specify

@FedorSteeman
Copy link
Contributor

FedorSteeman commented Feb 13, 2024

After discussing this with @bhsi-snm : Not sure how easy it is to do in OpenRefine. Perhaps @jlegind can conjure up a little utility program for adding this Source file column? Then I'll repurpose a field in Specify to map it to. Or perhaps investigate OpenRefine option?

@FedorSteeman
Copy link
Contributor

There isn't really a way for GREL to get the file, or rather, OpenRefine project name as far as I can see. The only way I can think of is this being added manually. I would also recommend treating this as a tabular remark field (c.f. #444) so we don't occupy any customizable text fields with it.

@FedorSteeman FedorSteeman removed their assignment Feb 19, 2024
@jlegind
Copy link
Contributor Author

jlegind commented Feb 19, 2024

We already have a remarks, the new column might be 'remark_source' which can be:
NHMD_PinnedInsects_20240119_15_40_RL_original.csv

@jlegind
Copy link
Contributor Author

jlegind commented Feb 19, 2024

Question: Should remark_date be the date that the export was made, or the date it was post processed?

@FedorSteeman
Copy link
Contributor

As a result from the implementation of #444 we already have a column "remark source", so I suggest you choose another name.
As you can see, for tabular remarks, we need three columns:

  • remark (the remark itself)
  • source (the source of the remark, in the case of ticket 444 this is "DaSSCo digitisation")
  • date (the date of the remarks, which in the case of ticket 444 corresponds to recorded date)

For the specimen level remarks field, these fields are just prefixed "remark", so you get "remark source" and "remark date".

Actually using the term "source" for the filename of the data is confusing here; Maybe it's better to use "datafile".

So that means the following column names;

  • datafile_remark (the filename)
  • datafile_source (a description of the raison d'ëtre of the remark, e.g. "DaSSCo datafile" or some such)
  • datafile_date (I suppose this should be the date of export of the datafile)

@bhsi-snm Do you approve of this proposal?

@jlegind
Copy link
Contributor Author

jlegind commented Feb 20, 2024

name of the data is confusing here; Maybe it's better to use "datafile".

So that means the following column names;

Since we have code ready for monitoring a directory: I could extend this to add "datafile_source" and "datafile_date" to the csv export. This circumvents openRefine.

@jlegind
Copy link
Contributor Author

jlegind commented Mar 18, 2024

The "datafile_source" and "datafile_date" and "datafile_remarks" columns for the tabular remarks have been added through the monitoring script.

@jlegind jlegind closed this as completed Mar 18, 2024
@jlegind jlegind reopened this Mar 18, 2024
@jlegind
Copy link
Contributor Author

jlegind commented Mar 20, 2024

See issue #492 on conditionally adding values in the remarks columns.

@AstridBVW
Copy link
Collaborator

The monitoring script was not entirely implemented before Jan left so it has been made part of the post-processing GREL script instead (ticket #506 ).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend backend persistence Database post-processing Specimen table export preparation for import using OpenRefine requires-discussion Specify Related to (interactions with) the Specify SW system
Projects
Development

No branches or pull requests

4 participants