-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to zero-fill data #16
Comments
Thanks @mstrimas , this is excellent as always! Rolling up to species makes sense as you describe, as does handling the "X" observed only case. I gather both incomplete checklists and genus-level-only identifications are just filtered/dropped from the data in this context? (Arguably some users may want to 'roll up' to a higher-taxonomy level than species, which I suppose could be supported as well, at least to Genus). I think I follow the process for the actual join... A few notes as I digest this:
I haven't really wrapped my head around what this looks like in a "real use case". (i.e. I definitely see why I need to join in the effort data from sampling_event_id's found in the checklists table that saw no birds at all, and thus aren't in the observations table. But I'm trying to wrap my head around if I really need to Good call on versioning, we can probably enforce that. Minor thing I noticed in your example code: I think you want an |
The missing I didn't realize some of the
Assuming users properly subset the data spatially and taxonomically, hopefully this will be feasible... If you just have a single species you don't need
|
Moving this discussion here since #15 is unrelated as initially reported.
Zero-filling is based on the concept on a "complete checklist", which means it only applies to checklists where
all_species_reported == 1
. In addition, only species can be zero-filled. For example, Yellow-rumped Warbler can be zero-filled, but taxa below species (e.g. "Yellow-rumped Warbler (Myrtle)") and taxa reported above species (e.g. "Warbler sp.") cannot be zero-filled. Scientific names corresponding to species can be identified as those in the tableauk::ebird_taxonomy
withcategory == "species"
; thiscategory
column also appears in the EBD.Rolling up taxa below species: eBirders can report taxa below the species level and, in many cases, a checklist can have multiple taxa all corresponding to the same species, e.g.
All of these observations need to be rollup up to the species level either prior to or immediately after zero-filling since the complete checklist concept only applies to species. The typical way to do this is to group by
scientific_name
orcommon_name
and sumobservation_count
. However, one catch is thatobservation_count
can be "X" indicating the bird was detected, but no count was recorded. Typically we address this by breakingobservation_count
into two variables indicating detection and count, respectively.However, we could also just retain a single column and use NA to indicate detection but no count, I've just found sometimes that's confusing for users.
Once taxa below species have been dealt with, you can zero fill a single species by left-joining to the checklist data and replacing NAs with 0s. For multiple species, unless there's some fancy way to do this in SQL you either need to loop through the species and left join one at a time or left join then use something like
tidyr::complete()
to fill in the missing cases.A few additional notes that come to mind:
ebd_relDec-2021.txt
andebd_sampling_relDec-2021.txt
.The text was updated successfully, but these errors were encountered: