Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extracting barcodes per component #52

Open
kmh005 opened this issue Apr 3, 2024 · 3 comments
Open

Extracting barcodes per component #52

kmh005 opened this issue Apr 3, 2024 · 3 comments

Comments

@kmh005
Copy link

kmh005 commented Apr 3, 2024

Hello,

Terrific package, started using in October 2023. I've hit a few snags with the latest release (new install) with MetadataPlot, I'll post a separate issue.

With respect to feature extraction of the contributing barcodes to the components, is it my understanding that with the sparse representation, that all non-0 value barcodes from the cell embeddings are treated as counting towards the component? I need to match up with what barcodes are retained per component in the MetadataPlot.

Or should I try to apply something more like Kim et al 2007 to this approach to score and extract, as is done in the NMF package?

Your guidance would be most appreciated here.

Thanks,
kmh005

@zdebruine
Copy link
Owner

Your intuition is correct that any non-0 value indicates contribution of any sample or feature (i.e. cell barcode or transcript ID) to that component. While in theory you can do any type of scoring, enrichment analysis, or summary statistic on the model, it sometimes is most effective (and least error-prone) to just stay with the actual component weights for interpretation.

Of course, bear in mind that the resolution (rank) of the model is very important. The model can "hallucinate" by squishing together information that should not be in the same component (underfitting due to too low of a rank) or fail to appreciate information that should indeed be viewed jointly (overfitting due to too high of a rank), and this tradeoff is a hard one to really understand.

@kmh005
Copy link
Author

kmh005 commented Apr 3, 2024

Thank you for the explanation and quick respones, it's much appreciated. Hopefully my first model has a good rank (21 for 125k barcodes). Feel free to close out!

@kmh005
Copy link
Author

kmh005 commented Apr 19, 2024

Hi again,

Just following up here, if you don't mind. How are the barcodes for the MetadataSummary/MetadataPlot pulled? When I use all non-0 values of object@reductions[["nmf"]]@cell.embeddings[,component], I am getting a much larger number of barcodes than are represented from MetadataSummary. For component 1, roughly 51k non-zero to 72k zero. Looking at the MetadataPlot, component 1 is largely dominated by 3 cell types, but they don't add up to nearly 51k. The same tracks through each component. A little more explanation would be appreciated, if you have the time. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants