Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core dump #6

Open
linzhi2013 opened this issue Jun 15, 2022 · 1 comment
Open

core dump #6

linzhi2013 opened this issue Jun 15, 2022 · 1 comment

Comments

@linzhi2013
Copy link

linzhi2013 commented Jun 15, 2022

Hi there,

Thanks for the tool!

When I tried the meshclust 3.0, I got the core dump error, do you have any suggestions for this? thank you!

The compute-node of the cluster has 56 cores (112 threads), 1.5T RAM, and we did not limit how much RAM the meshclust would like to use.

Best
Guanliang

-rw-rw-r-- 1 gmeng 1.5G Jun 13 17:15 combined.fa
-rw-rw-r-- 1 gmeng  112 Jun 14 10:00 meshclust3.sh
-rw-r--r-- 1 gmeng 5.9K Jun 15 20:16 meshclust3.sh.o539214
-rw------- 1 gmeng  18G Jun 15 22:18 core.229599
-rw-r--r-- 1 gmeng  416 Jun 15 22:18 meshclust3.sh.e539214
$ grep -c '>' combined.fa
5652580

meshclust3.sh:

/home/gmeng/soft/MeShClust_v3/Identity/bin/meshclust -d combined.fa -t 0.6  -o out.clstr -c 80 -e y -a n -p 10

meshclust3.sh.o539214:

MeShClust v3.0 is developed by Hani Z. Girgis, PhD.

This program clusters DNA sequences using identity scores obtained without alignment.

Copyright (C) 2021-2022 Hani Z. Girgis, PhD

Academic use: Affero General Public License version 1.

Any restrictions to use for profit or non-academics: Alternative commercial license is required.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;
without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Please contact Dr. Hani Z. Girgis ([email protected]) if you need more information.

Please cite the following papers:
	1. Identity: Rapid alignment-free prediction of sequence alignment identity scores using
	self-supervised general linear models. Hani Z. Girgis, Benjamin T. James, and Brian B.
	Luczak. NAR GAB, 3(1):lqab001, 2021.
	2. MeShClust: an intelligent tool for clustering DNA sequences. Benjamin T. James,
	Brian B. Luczak, and Hani Z. Girgis. Nucleic Acids Res, 46(14):e83, 2018.
	3. MeShClust v3.0: High-quality clustering of DNA sequences using the mean shift algorithm
	and alignment-free identity scores. Hani Z. Girgis. A great journal. 2022.

Database file: combined.fa
Output file: out.clstr
Cores: 80
Provided threshold: 0.6
Block size for all vs. all: 25000
Block size for reading sequences: 100000
Number of data passes: 10
Can assign all: No


Average: 756
K: 4
Histogram size: 256
A histogram entry is 16 bits.
Generating data.
Preparing data ...
	Positive examples: 10000
	Training size: 5000
	Validation size: 5000
Better performance of: 0.00324074
	chi_squared x jeffrey_divergence
Better performance of: 0.00278104
	chi_squared x jeffrey_divergence
	chi_squared^2 x d2_s_r^2
Better performance of: 0.00275123
	chi_squared x jeffrey_divergence
	chi_squared^2 x d2_s_r^2
	squared_chord^2 x hellinger^2
Better performance of: 0.00271437
	chi_squared x jeffrey_divergence
	chi_squared^2 x d2_s_r^2
	bray_curtis^2 x d2_s_r^2
	squared_chord^2 x hellinger^2
Better performance of: 0.00266334
	chi_squared x squared_chord
	chi_squared x jeffrey_divergence
	chi_squared^2 x d2_s_r^2
	bray_curtis^2 x d2_s_r^2
	squared_chord^2 x hellinger^2
	kulczynski_2^2 x d2_s_r^2
Better performance of: 0.00263148
	squared_chord
	chi_squared x squared_chord
	chi_squared x jeffrey_divergence
	chi_squared^2 x d2_s_r^2
	bray_curtis^2 x d2_s_r^2
	squared_chord^2 x hellinger^2
	kulczynski_2^2 x d2_s_r^2
Better performance of: 0.00257594
	squared_chord
	chi_squared x squared_chord
	chi_squared x jeffrey_divergence
	hellinger x hellinger^2
	chi_squared^2 x d2_s_r^2
	bray_curtis^2 x d2_s_r^2
	squared_chord^2 x hellinger^2
	kulczynski_2^2 x d2_s_r^2
Better performance of: 0.00249854
	squared_chord
	manhattan x simMM
	chi_squared x squared_chord
	chi_squared x jeffrey_divergence
	hellinger x hellinger^2
	chi_squared^2 x d2_s_r^2
	bray_curtis^2 x d2_s_r^2
	squared_chord^2 x hellinger^2
	kulczynski_2^2 x d2_s_r^2
Selected statistics:
	squared_chord
	manhattan x simMM
	chi_squared x squared_chord
	chi_squared x jeffrey_divergence
	hellinger x hellinger^2
	chi_squared^2 x d2_s_r^2
	bray_curtis^2 x d2_s_r^2
	squared_chord^2 x hellinger^2
	kulczynski_2^2 x d2_s_r^2
Finished training.
	MAE: 0.036734
	MSE: 0.00249854
Optimizing ...
Validating ...
	MAE: 0.0426102
	MSE: 0.00325363

Clustering ...

Data run 1 ...
	Processed sequences: 25000
	Unprocessed sequences: 0
	Found centers: 772
	Processed sequences: 50000
	Unprocessed sequences: 24657
	Found centers: 770
	Processed sequences: 100478
	Unprocessed sequences: 41448
	Found centers: 1278
	Processed sequences: 166024
	Unprocessed sequences: 32518
	Found centers: 2628
	Processed sequences: 206655
	Unprocessed sequences: 27580
	Found centers: 3034
	Processed sequences: 338846
	Unprocessed sequences: 65658
	Found centers: 3620
	Processed sequences: 348903
	Unprocessed sequences: 50307
	Found centers: 4308
	Processed sequences: 414183
	Unprocessed sequences: 67888
	Found centers: 4653
	Processed sequences: 428889
	Unprocessed sequences: 56801
	Found centers: 5147
	Processed sequences: 473924
	Unprocessed sequences: 66571
	Found centers: 5560
	Processed sequences: 591912
	Unprocessed sequences: 101368
	Found centers: 6457
	Processed sequences: 599863
	Unprocessed sequences: 83946
	Found centers: 6943
	Processed sequences: 682732
	Unprocessed sequences: 112078
	Found centers: 7277
	Processed sequences: 694499
	Unprocessed sequences: 97930
	Found centers: 7757
	Processed sequences: 752209
	Unprocessed sequences: 114752
	Found centers: 8067
	Processed sequences: 767163
	Unprocessed sequences: 94407
	Found centers: 8447
	Processed sequences: 867163
	Unprocessed sequences: 141679
	Found centers: 8792
	Processed sequences: 875812
	Unprocessed sequences: 125026
	Found centers: 9248
	Processed sequences: 950986
	Unprocessed sequences: 155363
	Found centers: 9586
	Processed sequences: 962281
	Unprocessed sequences: 137454
	Found centers: 10001
	Processed sequences: 1050620
	Unprocessed sequences: 173768
	Found centers: 10430
	Processed sequences: 1060816
	Unprocessed sequences: 156809
	Found centers: 10884
	Processed sequences: 1138833
	Unprocessed sequences: 189905
	Found centers: 11240
	Processed sequences: 1219898
	Unprocessed sequences: 191996
	Found centers: 12162
	Processed sequences: 1234377
	Unprocessed sequences: 173682
	Found centers: 12615
	Processed sequences: 1328038
	Unprocessed sequences: 210768
	Found centers: 13095
	Processed sequences: 1338108
	Unprocessed sequences: 194114
	Found centers: 13563
	Processed sequences: 1413309
	Unprocessed sequences: 217638
	Found centers: 13916
	Processed sequences: 1426200
	Unprocessed sequences: 203726
	Found centers: 14366
	Processed sequences: 1482720
	Unprocessed sequences: 217439
	Found centers: 14648
	Processed sequences: 1549592
	Unprocessed sequences: 216905
	Found centers: 15453
	Processed sequences: 1566431
	Unprocessed sequences: 205939
	Found centers: 15909
	Processed sequences: 1610994
	Unprocessed sequences: 211989
	Found centers: 16228

meshclust3.sh.e539214:

Mean 1 (mean1) and Mean 2 (mean2) cannot be zeros. Mean 1 is: 0, mean 2 is: 0.226562

terminate called after throwing an instance of 'std::exception'
  what():  std::exception
/opt/gridengine/default/spool/compute-0-0/job_scripts/539214: Zeile 1: 229599 Abgebrochen             (Speicherabzug geschrieben) /home/gmeng/soft/MeShClust_v3/Identity/bin/meshclust -d combined.fa -t 0.6 -o out.clstr -c 80 -e y -a n -p 10
@hani-girgis
Copy link
Member

Hi, Guanliang.

Thanks for your interest in MeShClust.

I suspect that one of the sequences in combined.fa is too short or has many uncertain nucleotides, e.g., N. Can you please verify and let me know?

Best regards.

Hani Z. Girgis, PhD

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants