Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several issues when using scanpy object converted from seurat #7

Closed
ChristinaSteyn opened this issue Feb 4, 2022 · 3 comments
Closed

Comments

@ChristinaSteyn
Copy link

Hi there,

Thank you so much for creating such an awesome tool! I am quite new to coding especially in python and encountered several errors when running the NSForest function. I am not sure if these problems were specific to my object but I thought I would post them here in case someone else has the same issues. I have tried to fix some of the errors and have gotten the function to finish but I'm not 100% sure if the output is correct.

#---------------------------------------------------------------------------------------------------------------------------------------
The first error was in line 167 of the source code:
"AttributeError: Can only use .cat accessor with a 'category' dtype"
I changed this:

medianValues = pd.DataFrame(columns=adata.var_names, index=adata.obs[clusterLabelcolumnHeader].cat.categories)

to this:

medianValues = pd.DataFrame(columns=adata.var_names, index=adata.obs[clusterLabelcolumnHeader].unique())

which seemed to work

#----------------------------------------------------------------------------------------------------------------------------------------------
The second error was in line 172:
"ValueError: Shape of passed values is (49211, 1), indices imply (49211, 31002)" which I think is because the input to create the pandas data frame was in the wrong format.

When I changed this:

Subset_dataframe = pd.DataFrame(data = subset_adata.X, index = subset_adata.obs, columns = subset_adata.var_names)

to this:

Subset_dataframe = pd.DataFrame(data = subset_adata.X.toarray(), index = list(subset_adata.obs["cells"].tolist()), columns =
subset_adata.var_names)

it seemed to work.

#------------------------------------------------------------------------------------------------------------------------------------------------
A similar problem in line 121 of the source code:
When running:

def fbetaTest(x, column, adata, Binary_RankedList, testArray, betaValue = 0.5)

I get this error "ValueError: Shape of passed values is (113957, 1), indices imply (113957, 0)"

But I changed this:

Subset_dataframe = pd.DataFrame(data = subset_adata.X, index = subset_adata.obs_names, columns = subset_adata.var_names)

to this:

Subset_dataframe = pd.DataFrame(data=subset_adata.X.toarray(), index=subset_adata.obs_names.tolist(), columns=subset_adata.var_names)

and it seemed to work.

#-------------------------------------------------------------------------------------------------------------------------------------

Another error was in line 94 of the source code:
"IndexError: Index dimension must be <= 2"

X = x_train[:, None]

I don't think this code is actually necessary so I commented it out which seemed to fix the problem.

#----------------------------------------------------------------------------------------------------------------------------------------------
Then I also got several errors in the last section and I couldn't quite figure out what the problems were but I changed the code from this:

#Move binary genes to Results dataframe
clusters2Genes = pd.DataFrame(columns = ['Gene', 'clusterName'])
clusters2Genes["clusterName"] = Binary_score_store_DF["clusterName"]
clusters2Genes["Gene"] = Binary_score_store_DF.index
GroupedBinarylist = clusters2Genes.groupby('clusterName').apply(lambda x: x['Gene'].unique()) 
BinaryFinal = pd.DataFrame(columns = ['clusterName','Binary_Genes'])
BinaryFinal['clusterName'] = GroupedBinarylist.index
BinaryFinal['Binary_Genes'] = GroupedBinarylist.values

to this:

 Binary_score_store_DF = pd.read_csv('NS-Forest_v3_Extended_Binary_Markers_Supplmental.csv')

# Move binary genes to Results dataframe
clusters2Genes = pd.DataFrame(columns=['Gene', 'clusterName'])
clusters2Genes["clusterName"] = Binary_score_store_DF["clusterName"]
clusters2Genes["Gene"] = Binary_score_store_DF["Unnamed: 0"]
clusters2Genes.to_csv('clusters2Genes.csv')
#GroupedBinarylist = clusters2Genes.groupby('clusterName').apply(lambda x: x['Gene'].unique())
#GroupedBinarylist = clusters2Genes.apply(lambda x: x['Gene'].unique()) #This seemed to work earlier

BinaryFinal = pd.DataFrame(columns=['clusterName', 'Binary_Genes'])
BinaryFinal['clusterName'] =  clusters2Genes["clusterName"]
BinaryFinal['Binary_Genes'] = clusters2Genes["Gene"]
BinaryFinal.to_csv('BinaryFinal.csv')

It seems that in this line of code:

clusters2Genes["clusterName"] = Binary_score_store_DF["clusterName"]

the column name in the Binary_score_store dataframe was incorrect

And in this line of code:

GroupedBinarylist = clusters2Genes.groupby('clusterName').apply(lambda x: x['Gene'].unique())

the clusters are already grouped together and all the gene names are already unique??

Couldn't get around the last problem so ended up just commenting out those lines of code and now I am not sure if my output files are what they should be. Any advice would be greatly appreciated!

@adRn-s
Copy link

adRn-s commented May 5, 2022

Hi there, I was just trying to test NS-Forest 3.0 on a Seurat dataset I have. Sadly, I got at least 2 of the very same errors, and applied your fixes. They worked ok... so, thanks for posting! But afterwards I got another different error (key 'cells' not found). I'm not so patience as to fix it... Anyway, I just wanted to comment on this... for the record.

@ChristinaSteyn
Copy link
Author

Thanks, really appreciate hearing that someone else had the same problems and it is not just me!

@yunzhang813
Copy link
Collaborator

Thanks of the ticket. Code refactored in v4.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants