Much larger element cout in output matrix #127

rvernica · 2022-08-27T19:16:50Z

rvernica
Aug 27, 2022

It look like one of the corrected output matrices has significantly more elements than the corresponding input matrix. Is this expected?

The input looks like this:

[<5811x23034 sparse matrix of type '<class 'numpy.float64'>'
 	with 15076894 stored elements in Compressed Sparse Row format>,
 <5447x23953 sparse matrix of type '<class 'numpy.float64'>'
 	with 19462707 stored elements in Compressed Sparse Row format>]

scanorama.correct log is this:

Found 21890 genes among all datasets
[[0.         0.72021296]
 [0.         0.        ]]
Processing datasets (0, 1)

The corrected output looks like this:

[<5811x21890 sparse matrix of type '<class 'numpy.float64'>'
 	with 15074955 stored elements in Compressed Sparse Row format>,
 <5447x21890 sparse matrix of type '<class 'numpy.float64'>'
 	with 118935312 stored elements in Compressed Sparse Row format>]

So, the inputs are 15M and 19M elements. The outputs are 15M and 118M. Why is the second output so much larger than the input? Is this expected?

I re-tried on a subset of these datasets and got the same pattern but now the first output matrix has significantly more elements:
Input:

[<321x421 sparse matrix of type '<class 'numpy.float64'>'
 	with 20662 stored elements in Compressed Sparse Row format>,
 <659x486 sparse matrix of type '<class 'numpy.float64'>'
 	with 61978 stored elements in Compressed Sparse Row format>]

scanorama.correct log:

Found 405 genes among all datasets
[[0.         0.87227414]
 [0.         0.        ]]
Processing datasets (0, 1)

Output:

[<321x405 sparse matrix of type '<class 'numpy.float64'>'
 	with 129365 stored elements in Compressed Sparse Row format>,
 <659x405 sparse matrix of type '<class 'numpy.float64'>'
 	with 61755 stored elements in Compressed Sparse Row format>]

Notice the first input matrix has 20K elements while the first output matrix has 129K elements.

Answered by brianhie

Aug 28, 2022

Thanks for reaching out @rvernica! Scanorama will do a dense transformation of one matrix (and it's corresponding distances) into the "space" of another matrix (by default, I believe the smaller matrix will be transformed). So, the larger is matrix is unchanged, and the smaller matrix has its distances "corrected" based on the dense transformation.

I would probably encourage you to just use Scanorama integration in the low dimensional (dense) space, since that seems to benchmark better and I would not recommend interpreting corrected high-dimensional space anything beyond the distances defined by the vectors.

View full answer

brianhie · 2022-08-28T19:01:02Z

brianhie
Aug 28, 2022
Maintainer

Thanks for reaching out @rvernica! Scanorama will do a dense transformation of one matrix (and it's corresponding distances) into the "space" of another matrix (by default, I believe the smaller matrix will be transformed). So, the larger is matrix is unchanged, and the smaller matrix has its distances "corrected" based on the dense transformation.

I would probably encourage you to just use Scanorama integration in the low dimensional (dense) space, since that seems to benchmark better and I would not recommend interpreting corrected high-dimensional space anything beyond the distances defined by the vectors.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Much larger element cout in output matrix #127

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Much larger element cout in output matrix #127

rvernica Aug 27, 2022

Replies: 1 comment

brianhie Aug 28, 2022 Maintainer

rvernica
Aug 27, 2022

brianhie
Aug 28, 2022
Maintainer