Extracting barcodes per component #52

kmh005 · 2024-04-03T14:09:19Z

Hello,

Terrific package, started using in October 2023. I've hit a few snags with the latest release (new install) with MetadataPlot, I'll post a separate issue.

With respect to feature extraction of the contributing barcodes to the components, is it my understanding that with the sparse representation, that all non-0 value barcodes from the cell embeddings are treated as counting towards the component? I need to match up with what barcodes are retained per component in the MetadataPlot.

Or should I try to apply something more like Kim et al 2007 to this approach to score and extract, as is done in the NMF package?

Your guidance would be most appreciated here.

Thanks,
kmh005

zdebruine · 2024-04-03T14:29:15Z

Your intuition is correct that any non-0 value indicates contribution of any sample or feature (i.e. cell barcode or transcript ID) to that component. While in theory you can do any type of scoring, enrichment analysis, or summary statistic on the model, it sometimes is most effective (and least error-prone) to just stay with the actual component weights for interpretation.

Of course, bear in mind that the resolution (rank) of the model is very important. The model can "hallucinate" by squishing together information that should not be in the same component (underfitting due to too low of a rank) or fail to appreciate information that should indeed be viewed jointly (overfitting due to too high of a rank), and this tradeoff is a hard one to really understand.

kmh005 · 2024-04-03T19:25:39Z

Thank you for the explanation and quick respones, it's much appreciated. Hopefully my first model has a good rank (21 for 125k barcodes). Feel free to close out!

kmh005 · 2024-04-19T16:29:31Z

Hi again,

Just following up here, if you don't mind. How are the barcodes for the MetadataSummary/MetadataPlot pulled? When I use all non-0 values of object@reductions[["nmf"]]@cell.embeddings[,component], I am getting a much larger number of barcodes than are represented from MetadataSummary. For component 1, roughly 51k non-zero to 72k zero. Looking at the MetadataPlot, component 1 is largely dominated by 3 cell types, but they don't add up to nearly 51k. The same tracks through each component. A little more explanation would be appreciated, if you have the time. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extracting barcodes per component #52

Extracting barcodes per component #52

kmh005 commented Apr 3, 2024

zdebruine commented Apr 3, 2024

kmh005 commented Apr 3, 2024

kmh005 commented Apr 19, 2024

Extracting barcodes per component #52

Extracting barcodes per component #52

Comments

kmh005 commented Apr 3, 2024

zdebruine commented Apr 3, 2024

kmh005 commented Apr 3, 2024

kmh005 commented Apr 19, 2024