-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extracting barcodes per component #52
Comments
Your intuition is correct that any non-0 value indicates contribution of any sample or feature (i.e. cell barcode or transcript ID) to that component. While in theory you can do any type of scoring, enrichment analysis, or summary statistic on the model, it sometimes is most effective (and least error-prone) to just stay with the actual component weights for interpretation. Of course, bear in mind that the resolution (rank) of the model is very important. The model can "hallucinate" by squishing together information that should not be in the same component (underfitting due to too low of a rank) or fail to appreciate information that should indeed be viewed jointly (overfitting due to too high of a rank), and this tradeoff is a hard one to really understand. |
Thank you for the explanation and quick respones, it's much appreciated. Hopefully my first model has a good rank (21 for 125k barcodes). Feel free to close out! |
Hi again, Just following up here, if you don't mind. How are the barcodes for the MetadataSummary/MetadataPlot pulled? When I use all non-0 values of |
Hello,
Terrific package, started using in October 2023. I've hit a few snags with the latest release (new install) with MetadataPlot, I'll post a separate issue.
With respect to feature extraction of the contributing barcodes to the components, is it my understanding that with the sparse representation, that all non-0 value barcodes from the cell embeddings are treated as counting towards the component? I need to match up with what barcodes are retained per component in the MetadataPlot.
Or should I try to apply something more like Kim et al 2007 to this approach to score and extract, as is done in the NMF package?
Your guidance would be most appreciated here.
Thanks,
kmh005
The text was updated successfully, but these errors were encountered: