-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pilot conversion of MIxS TSV into a GBIF/OBIS system #53
Comments
@79-6d Do you have something that is in MIxS already? If so I suggest we work with that? If would be great if we can come up with a notebook that does most of the conversion automatically based on what's in the spreadsheet. |
Just realised that the dataset we have is from MIxS v2, I will write to our data provider to ask if she has something from MIxS v5. Great idea about working on it in a notebook! |
For reference, here are two examples of test datasets in the GBIF test environment that use the DNA derived data extension. SMHI Baltic Picoplankton (Marine) - about: https://www.ebi.ac.uk/ena/browser/view/PRJEB12362
Insect mobile (Terrestrial) - about: https://www.biorxiv.org/content/10.1101/2020.11.19.389742v1 |
Awesome!! Do you mind to share the link to the repo on how the conversion is made if that's available? How would the eml part be addressed? Do you get the data provider to fill in those information or is that something that will be extracted from the data? |
@79-6d both of these datasets were uploaded by the publishers through IPTs. The extension is of course not in production, but IPTs running in test mode detects the extension and can map to it. |
I found a suitable marine 'omics dataset that we can use to look at the conversion from MIxS to DwC in our biodiversity.aq/POLA3R database. It is a microbial dataset where the authors used 16S rDNA amplicon sequencing to profile the community composition of Bacteria and Archaea in marine sediments. I think it's a good representative of a typical (small) microbial DNA-based dataset. Here is the .xlsx file how we formatted it as MIxS, I adapted it to MIxS v5 This dataset was published in: here is the link to the IPT: https://ipt.biodiversity.aq/resource?r=antarctic_marine_sediment_microbes The sequences can be retrieved from here: https://www.ebi.ac.uk/ena/browser/view/PRJNA335729 |
Thanks @msweetlove , |
I can open it just fine... Here is a version saved as tab separated txt, does this work? MIxS_testdataset_PRJNA335729.txt The original is a csv, but for some reason GitHub doesn't allow that format. |
Yes - the txt file works for. Thanks! |
Is there any taxonomic annotation of the sequences available?
But I cant seem to find any information on the classification step (database, thresholds etc) |
I don't have any more information than that... Like most microbial studies, these authors only provide the raw sequence data because the methods to bin/cluster sequences, detect errors and taxonomically annotate sequences vary widely from lab to lab and the techniques evolve very fast over time... You can always try to contact the authors if they still have the original OTU tables, or use your own pipeline to annotate the sequences, or you can also request an analysis at MGnify: https://www.ebi.ac.uk/metagenomics/ |
OK, thanks. I just wanted to be sure that I didn´t overlook anything. |
@msweetlove the dataset is now in the GBIF test environment here:
This one is using the extension from this repo with the MIXS IRIs |
@timrobertson100 @thomasstjerne @cmungall @pieterprovoost @79-6d
Following our meeting today, would you mind scoping out how you'll test exchanging a MIxS TSV for an attempt to auto-convert it into a DwC Archive?
The text was updated successfully, but these errors were encountered: