Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Images missing for multiple NHMD Pinned Insects & Herbarium exports #113

Closed
beckerah opened this issue Sep 18, 2024 · 12 comments
Closed

Images missing for multiple NHMD Pinned Insects & Herbarium exports #113

beckerah opened this issue Sep 18, 2024 · 12 comments
Assignees
Labels
image issue issues with images ingestion issue issue ingesting images NHMD Natural History Museum Denmark

Comments

@beckerah
Copy link

beckerah commented Sep 18, 2024

Description:
While checking exports for NHMD Pinned Insects, I came across a few exports that I couldn't easily find the images for. After some research, it seems these specimens were digitized while the ingestion client was producing errors and not correctly ingesting images. I read through the slack messages (channel: ingestion_client_users) concerning the issue, and it sounds like the images were all successfully ingested July 29 - 30, 2024. Looking through those image folders, I was able to find some but not all of the images. I am still unable to locate images for the following exports:

  • NHMD_PinnedInsects_20240722_14_41_JMJ_original_copy.csv (Barcodes: 1739481-1751790)
  • NHMD_PinnedInsects_20240724_15_59_MJG_original.csv (Barcodes: 1740064-1740140)

The first export has some data in it that needs to be checked against the image so I'm unable to proceed with that until we locate the images.

Next steps:
I need to ask Khaled and Bhupjit if they have any leads as to where the images could be.

@beckerah beckerah self-assigned this Sep 18, 2024
@beckerah beckerah transferred this issue from NHMDenmark/Mass-Digitizer Sep 18, 2024
@beckerah beckerah added NHMD Natural History Museum Denmark ingestion issue issue ingesting images image issue issues with images labels Sep 18, 2024
@beckerah
Copy link
Author

Images for this export were found:

  • NHMD_PinnedInsects_20240724_15_59_MJG_original.csv (Barcodes: 1740064-1740140)

Still missing images for:

  • NHMD_PinnedInsects_20240722_14_41_JMJ_original_copy.csv (Barcodes: 1739481-1751790)

@beckerah
Copy link
Author

beckerah commented Sep 23, 2024

Images for both exports have now been located.

However, I am having trouble locating images for:

  • NHMD_PinnedInsects_20240716_14_47_RL_processed.tsv (254 out of 348 images missing, others in a HERB folder)
  • NHMD_PinnedInsects_20240718_11_57_AI_processed.tsv (77 out of 123 images missing)

@beckerah beckerah changed the title Images missing for two NHMD Pinned Insects exports Images missing for multiple NHMD Pinned Insects & Herbarium exports Sep 25, 2024
@beckerah
Copy link
Author

I'm also having trouble finding images to match NHMD Herbarium exports from July. I've now got a script running to match barcodes to GUIDs in a database. It will take a few days to go through all the folders and add everything to the database but once it's finished, we should be able to see where all of the images are (assuming they exist on the N drive.)

@beckerah
Copy link
Author

I am still finding exports that I cannot locate matching images for. These were all from the end of July 2024. Pip suggested I ask Khaled and/or Bhupjit if they have any ideas about where these images could be. After gaining their insight, I will either:

  • hold off on processing this data for specify until we can locate the images
  • divide the spreadsheets and only import the records we're certain are accurate

It's possible that these specimens may need to be re-imaged.

@beckerah
Copy link
Author

I was able to resume using an old script that matches image GUIDs with their respective barcodes in a database table. It will take a while to go through all the folders but already I've located some of the missing images using the database.

Additionally, Rebekka found some images on the WORKHERB0003 workstation from July and August that were not ingested due to errors. She is ingesting them now. Hopefully some of the missing herbarium images are in that set.

@beckerah
Copy link
Author

beckerah commented Oct 15, 2024

Still missing at least some images from the following exports:

-NHMD_Herba_20240724_14_31_SS_JMJ (some images ingested 8/6 at herb0003 but not all)
-- 1301948 missing
-NHMD_Herba_20240725_15_17_SS

@beckerah
Copy link
Author

Bhupjit suggested we chat about this with Khaled next Thursday when we're all in the office.

@beckerah
Copy link
Author

beckerah commented Nov 6, 2024

I'm still updating the databases with the barcodes and guids. Hopefully it won't take much longer to get those completely up-to-date. I modified the original barcode-guid matching script to first check if the guid already exists in the database before it processes the image, so that's sped things up a bit. I will need to manually enter in some barcodes because the processing package doesn't catch all of them. Once all of that is done, I can use another script I'm working on to check all the barcodes from DigiApp exports against the barcode-guid databases. That should return a full list of missing images.

Here's the task list to make it easier for me to see where I'm at:

  • Update all barcode-guid matching dbs (one for each workstation)
  • Manually update dbs with missing barcodes
  • Run script to check barcodes from DigiApp exports against dbs

@beckerah
Copy link
Author

I've manually updated the dbs for NHMD pinned insects and run the script for the date range July 1, 2024 - October 1, 2024 and it looks like the process works. I discovered two barcodes that don't have matching images.

I'm now working on NHMD Herbarium. Once I've focused on this initial date range, I'll extend to check all of 2024 (and possibly 2023 as well.)

@beckerah
Copy link
Author

I've now got a list of over 1,000 specimens from the date range July 1 - October 1, 2024 that have no associated images. There were clearly a few folders that did not get ingested from the end of July. Now I need to match up these barcodes with their locations from the DigiApp exports so I can get a workable list together for the digitizers. I also need to find out if we have a re-imaging policy in place already or if that is still in development.

@beckerah
Copy link
Author

beckerah commented Dec 4, 2024

You can follow the progress of this in the future here: #159

@beckerah beckerah closed this as completed Dec 4, 2024
@beckerah
Copy link
Author

I've now got a list of over 1,000 specimens from the date range July 1 - October 1, 2024 that have no associated images. There were clearly a few folders that did not get ingested from the end of July. Now I need to match up these barcodes with their locations from the DigiApp exports so I can get a workable list together for the digitizers. I also need to find out if we have a re-imaging policy in place already or if that is still in development.

This list has been added to the QA image issues spreadsheet so all current re-imaging issues are in one place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
image issue issues with images ingestion issue issue ingesting images NHMD Natural History Museum Denmark
Projects
None yet
Development

No branches or pull requests

1 participant