Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to better display higher taxonomic unknown levels? #422

Closed
antgonza opened this issue Oct 14, 2020 · 0 comments · Fixed by #487
Closed

How to better display higher taxonomic unknown levels? #422

antgonza opened this issue Oct 14, 2020 · 0 comments · Fixed by #487
Labels

Comments

@antgonza
Copy link
Collaborator

While discussing large trees and coloring by taxonomy, we realized that by default Q2 assigns "Unassigned" to sequences that didn't get an assignment, for example:
Screen Shot 2020-10-13 at 10 47 43 AM
and that Empress by default will fill missing levels with "Unspecified" and that this could become problematic when looking at higher levels, like this:
Screen Shot 2020-10-13 at 10 47 00 AM
as it merges the assignments that were "Unassigned" in level 1, with those without a classification at that specific level.

The questions are:

  1. What should we do about it? There were a couple of suggestions: (a) repeat the latests level available level in the missing ones, basically have "Unassigned" in all the levels or "k__Bacteria" if those are the highest level available; and/or (b) do 1 but also add the level we are looking in the label, from the example above: "Unassigned [L2]" in all the levels or "k__Bacteria [L2]"
  2. Should we do this by default or control via a parameter?
fedarko added a commit to fedarko/empress that referenced this issue Feb 17, 2021
Closes biocore#473 and closes biocore#422.

Avoids having to store this stuff in the QZV, which will save
a lot of space. This does still involve storing the "expanded"
taxonomy for a given level all at once in memory, though -- I do
not think this will be horrendous, since it is not that much extra
data all things considered.
kwcantrell pushed a commit that referenced this issue Apr 8, 2021
… ancestor info in the JS interface) (#487)

* DOC: various match_inputs() comments/doc fixes

* ENH: Pass ordered tax col names to Empress JS #473

See docs enclosed for discussion vs. bool flag approach

* ENH: Implement ancestor fm retrieval in JS

Closes #473 and closes #422.

Avoids having to store this stuff in the QZV, which will save
a lot of space. This does still involve storing the "expanded"
taxonomy for a given level all at once in memory, though -- I do
not think this will be horrendous, since it is not that much extra
data all things considered.

* STY: prettify

* MNT: abstract fm retrieval func generation

still gotta test (manually then automatically) tho...

* TST: unbreak js tests

* TST: unbreak some of the python tests

not done yet

* TST: unbreak all python tests

* TST: abstract python taxcol checking

* STY: fix flake8 long lines

* STY: more flake8 fixes

* TST: test #473 special fm stuff in JS

* MNT: pass length fm retrieval through new funcs

for sake of future consistency. really shouldn't change anything,
behavior- or performance-wise

* BUG: Change max width to be in vw units, not vh

makes this slightly larger, i think

* PERF/DOC: Add note abt repeated work - fm barplots
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment