Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scikit-learn version issue ValueError: node array from the pickle has an incompatible dtype #30

Closed
erfanshekarriz opened this issue Jan 18, 2024 · 1 comment

Comments

@erfanshekarriz
Copy link

Hello there.

I was pretty stoked to use vRhyme for my viral binning protocol, but unfortunately haven't been successful in running the program without any errors.

I initially wasn't able to supply my own sorted bam files that come from minimap2 -x sr (which is X3-4 times faster and also more accurate than bowtie2 - would recommend adding this as a mapping option). It would give me the same error as Issue #26 https://github.com/AnantharamanLab/vRhyme/issues/26 and would not produce the coverage table. I then gave up and thought to instead try out using the internal bowtie2 aligner but still ran into a different error.

This is the command I ran:

python workflow/software/vRhyme/vRhyme/vRhyme -i combined.viralcontigs.fa -r DRR093002_R1.fastq.gz DRR093002_R2.fastq.gz DRR093003_R1.fastq.gz DRR093003_R2.fastq.gz DRR093004_R1.fastq.gz DRR093004_R2.fastq.gz -l 2000 -t 32 -o deepsea/test_res/binning/viral/tmp/vrhyme/hydrothermal-vent-BMS --verbose

This time I checked thelog_vRhyme_paired_reads.tsv and the pairings are correct. I also checked and the vRhyme_coverage_values.tsv file is not empty .

Despite that I get the following log and error:

Date:     2024-01-18 (y-m-d)
Start:    14:32:05   (h:m:s)
Program:  vRhyme v1.1.0


Time (min) |  Log                                                   
--------------------------------------------------------------------
0.0           Initializing and validating vRhyme parameters
0.01          Paired end read file(s) identified. Running bowtie2 on 3 set of paired files
5.13          Extracting coverage information from BAM files
5.82          Coverage extraction complete. Generating coverage table
5.82          Performing pairwise coverage comparisons
5.86          Running Prodigal on filtered sequences
5.95          Generating codon usage features
5.95          Generating nucleotide features
5.99          Performing pairwise distance calculations
6.0           Performing machine learning classification
workflow/software/vRhyme/vRhyme/vRhyme:16: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
  import pkg_resources
Traceback (most recent call last):
  File "workingdir/vRhyme/vRhyme/vRhyme", line 960, in <module>
    net_data = machine_stuff.machine_stuff(distances, presets, model_method, pairs_machine, cohen_machine, iterations, cohen_check)
  File "workingdir/vRhyme/vRhyme/scripts/machine_stuff.py", line 73, in machine_stuff
    model_ET = pickle.load(read_model_ET)
  File "sklearn/tree/_tree.pyx", line 728, in sklearn.tree._tree.Tree.__setstate__
  File "sklearn/tree/_tree.pyx", line 1434, in sklearn.tree._tree._check_node_ndarray
ValueError: node array from the pickle has an incompatible dtype:
- expected: {'names': ['left_child', 'right_child', 'feature', 'threshold', 'impurity', 'n_node_samples', 'weighted_n_node_samples', 'missing_go_to_left'], 'formats': ['<i8', '<i8', '<i8', '<f8', '<f8', '<i8', '<f8', 'u1'], 'offsets': [0, 8, 16, 24, 32, 40, 48, 56], 'itemsize': 64}
- got     : [('left_child', '<i8'), ('right_child', '<i8'), ('feature', '<i8'), ('threshold', '<f8'), ('impurity', '<f8'), ('n_node_samples', '<i8'), ('weighted_n_node_samples', '<f8')]

Any idea on how we can resolve this? I was reading some blogs online saying it's related to the version of scikit-learn. If that is the case can you include the version of the software in the conda installation? This way we are guaranteed to fully reproduce your outcomes.

If you need my raw sequence files I'm happy to somehow send them to you. I can also send you the bam files generated from minimap2.

Best,

Erfan

@erfanshekarriz erfanshekarriz changed the title ValueError: node array from the pickle has an incompatible dtype Scikit-learn version issueValueError: node array from the pickle has an incompatible dtype Jan 18, 2024
@erfanshekarriz
Copy link
Author

erfanshekarriz commented Jan 18, 2024

I've resolved this issue by enforcing the scikit-learn version:

mamba create -c bioconda -n vRhyme python=3 networkx pandas numpy numba scikit-learn==1.2.2 pysam samtools mash mummer mmseqs2 prodigal bowtie2 bwa

Please help me update the installation instructions in the READ.md file. I would also strongly recommend noting the versions of all software above to allow longterm stability and reproducibility.

Best,

Erfan

@erfanshekarriz erfanshekarriz changed the title Scikit-learn version issueValueError: node array from the pickle has an incompatible dtype Scikit-learn version issue ValueError: node array from the pickle has an incompatible dtype Jan 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant