Update taxonomy information for box 1-285 in Specify #117

RebekkaML · 2024-08-14T12:48:43Z

At the beginning of digitizing at Herbarium C, no author names or taxon information below species level (hybrids, subspecies, variety, forma) were recorded. This was only started with box 286, so this information needs to be added to all entries from box 1 - 285.

For this, a spreadsheet was filled in with the author information for each taxon (#74 ).
Another sheet was filled in for any further taxonomic information (Hybrids, subspecies etc.) (#91 ).

The information from these 2 spreadsheets is now collected in a large table located on the N-Drive: : "N:\SCI-SNM-DigitalCollections\DaSSCo\Workflows and workstations\Herbarium\Infraspecies spreadsheet\Infraspecies_table_filled_in.xlsx"

Before this information can be uploaded to Specify, some last issues need to be resolved:

Once these Issues are resolved, we can plan how to import the missing information into Specify.

RebekkaML · 2024-08-15T12:35:08Z

The Issue Resolve Infraspecies spreadsheet notes before import. #114 was resolved by deciding to leave the taxonomy comments in for now and resolve these things after the import into Specify. This means that also the column "comments" needs to be imported, not just author names and subspecies / Hybrids etc.

RebekkaML · 2024-08-22T06:45:19Z

The related issues have been resolved and the updated and cleaned file is this:
Infraspecies_table_filled_in.xlsx

It can be found here: "N:\SCI-SNM-DigitalCollections\DaSSCo\Workflows and workstations\Herbarium\Infraspecies spreadsheet\Infraspecies_table_filled_in.xlsx"

The columns "subspecies _old" and "variety_old" refer to information that is already in Specify, in case this is important to distinguish.

The new information that needs to be imported is "Subspecies", "Subspecies_Author", "Variety", "Variety_Author", "Forma", "Forma_Author", "Hybrid_parent_1", "Hybrid_parent_1_Author, "Hybrid_parent_2", "Hybrid_parent_2_Author and "Comment".

The table also includes the Collection Object ID and current taxon ID for each specimen.

beckerah · 2024-10-08T09:35:08Z

I had a chat about this with Fedor, and he confirmed that there's no way to update records via workbench, which means we have two options:

Manually update each record (clearly we're not actually going to do this)
Update the records via the API

In order to use the API, we'll need to put together a script. This is going to require quite a bit of legwork, as I'll need to test the API calls and figure out all the primary & foreign keys, what to do about validation, etc. Bhupjit has already sent me a list of resources for playing around with this, which I'm tracking here: NHMDenmark/Projects/DaSSCo digitisation data/Research Specify API.

beckerah · 2024-10-09T14:33:43Z

Since Joaquim is already working on a script to update records in Specify via the API, (for the transcription app,) I can piggyback off of his efforts. I talked to him briefly about it on Slack and asked if he knew when that part would be ready. Here was his response:

I have not started working on it yet, but that's the plan. I should start working on it in a couple weeks, depending on how much changes will be needed on the transcription platform.
The first approach is to prepare a script that can ingest "formatted" data and push it into specify using the API. This should be achieved in a few weeks.
Then it could be extended to have an interface, and maybe allow users to allow some mapping of fields, and decide behaviours for conflicts, such as overwrite and ignore. Depending on the level of complexity it could take a bit more time, but desirably before the end of the year

Pip says this can wait, as it's lower priority than keeping digitization going, and developing new data pipeline for AU.

RebekkaML mentioned this issue Aug 14, 2024

Record subspecies, variety, forma etc. for boxes 1-285 in Herbarium C #91

Closed

beckerah self-assigned this Oct 2, 2024

PipBrewer mentioned this issue Oct 15, 2024

Early Digi app exports/imports without author name: backfill NHMDenmark/Mass-Digitizer#460

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update taxonomy information for box 1-285 in Specify #117

Update taxonomy information for box 1-285 in Specify #117

RebekkaML commented Aug 14, 2024 •

edited

Loading

RebekkaML commented Aug 15, 2024

RebekkaML commented Aug 22, 2024

beckerah commented Oct 8, 2024

beckerah commented Oct 9, 2024

Update taxonomy information for box 1-285 in Specify #117

Update taxonomy information for box 1-285 in Specify #117

Comments

RebekkaML commented Aug 14, 2024 • edited Loading

RebekkaML commented Aug 15, 2024

RebekkaML commented Aug 22, 2024

beckerah commented Oct 8, 2024

beckerah commented Oct 9, 2024

RebekkaML commented Aug 14, 2024 •

edited

Loading