You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried to download "slb" and "wd" databases using different versions (2022, 2021, 2020, 2019) but all returned an error. For example:
td_create(provider = "slb", version = 2019, overwrite = TRUE)
"could not find 2019_dwc_slb, 2019_common_slb
checking for older versions.
2019_dwc_slb not available2019_common_slb not available"
By inspecting the number of records of each database I noticed that the latest versions of "ncbi" and "col" have fewer records than older versions. I expected that the latest version had more records than the older ones.
taxadb::taxa_tbl("ncbi", version = 2022) %>% summarise(n()) #2950147
taxadb::taxa_tbl("ncbi", version = 2021) %>% summarise(n()) #3461657
taxadb::taxa_tbl("col", version = 2022) %>% summarise(n()) #807599
taxadb::taxa_tbl("col", version = 2021) %>% summarise(n()) #3615220
Finally, I noticed that probably due to issues related to my internet connection sometimes databases are created with fewer records than expected. For example, "ncbi" (v. 2022) had 32831 records instead of 2950147. I recognize that it is not a real issue, but maybe would be useful to check if the database has the expected number of records before performing queries. Just an idea.
The text was updated successfully, but these errors were encountered:
the wikidata and slb databases haven't been ported to the new system. We don't actually have a good mechanism to assemble and update wikidata names, so that will probably be deprecated, slb is just part of my backlog, sorry.
Thanks for checking the NCBI / COL numbers, looks like that could actually be an upstream bug. Note that taxadb checks the sha-256 hash of the downloaded file, so if it was a network issue on your end, it would throw an error.
More precisely, it looks like the 2022 versions of NCBI have only the species names tables, names that resolve only to a higher taxon rank are not listed in the scientificName column (though still available from the dedicated rank columns):
> taxadb::taxa_tbl("ncbi") %>% count(taxonRank)
# Source: lazy query [?? x 2]
# Database: duckdb_connection
taxonRank n
<chr> <dbl>
1 species 2950147
so I think we need to fix the 2022 tables for NCBI and COL
Hi,
I tried to download "slb" and "wd" databases using different versions (2022, 2021, 2020, 2019) but all returned an error. For example:
By inspecting the number of records of each database I noticed that the latest versions of "ncbi" and "col" have fewer records than older versions. I expected that the latest version had more records than the older ones.
Finally, I noticed that probably due to issues related to my internet connection sometimes databases are created with fewer records than expected. For example, "ncbi" (v. 2022) had 32831 records instead of 2950147. I recognize that it is not a real issue, but maybe would be useful to check if the database has the expected number of records before performing queries. Just an idea.
The text was updated successfully, but these errors were encountered: