You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running htc create_db with ~30k sequences, I get the following error:
Traceback (most recent call last):
File "./htc", line 327, in <module>
status = main()
File "./htc", line 286, in main
create_db.create_db(args.aligned_sequences, args.taxonomy, args.verbose, args.output, args.use_cm_align, args.template_al)
File "/Users/milanese/Dropbox/PhD/bin/htc/bin/create_db.py", line 714, in create_db
classifiers = train_all_classifiers(alignment, full_taxonomy)
File "/Users/milanese/Dropbox/PhD/bin/htc/bin/create_db.py", line 356, in train_all_classifiers
train_node_iteratively(node, sibilings, all_classifiers, alignment, full_taxonomy)
File "/Users/milanese/Dropbox/PhD/bin/htc/bin/create_db.py", line 327, in train_node_iteratively
train_node_iteratively(child, sibilings_child, all_classifiers, alignment, full_taxonomy)
File "/Users/milanese/Dropbox/PhD/bin/htc/bin/create_db.py", line 339, in train_node_iteratively
all_classifiers, alignment, node)
File "/Users/milanese/Dropbox/PhD/bin/htc/bin/create_db.py", line 314, in train_classifier
clf.fit(X, y)
File "/Users/milanese/miniconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py", line 1532, in fit
accept_large_sparse=solver != 'liblinear')
File "/Users/milanese/miniconda3/lib/python3.7/site-packages/sklearn/utils/validation.py", line 719, in check_X_y
estimator=estimator)
File "/Users/milanese/miniconda3/lib/python3.7/site-packages/sklearn/utils/validation.py", line 542, in check_array
allow_nan=force_all_finite == 'allow-nan')
File "/Users/milanese/miniconda3/lib/python3.7/site-packages/sklearn/utils/validation.py", line 56, in _assert_all_finite
raise ValueError(msg_err.format(type_err, X.dtype))
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
The issue is not that there are too many sequences. The issue is that there were some NA's.
This is because there are some genes that are present in the taxonomy, but not in the alignment. This is now solved in 778ef63.
Note, it would still be good to have more balanced classes, hence we open issue #8.
When running
htc create_db
with ~30k sequences, I get the following error:And in the log:
In particular there are 33,086 negative labels, and the train of the classifier breaks. Another related issue, is that classes are unbalanced.
The text was updated successfully, but these errors were encountered: