new behavior for protein k-mer size calculations - gathering the issues together. #999

ctb · 2020-05-25T15:42:35Z

Note: updated in #1525

when taking in DNA sequence to translate into protein, how should sourmash treat the k-mer size?

right now,

sourmash on DNA sequence -> DNA minhash uses exact k-mer size (k=21, 31, 51, or whatever) does reverse complement calculations. (src/core/src/sketch/minhash.rs::add_sequence(), if self.is_dna() block)
sourmash on protein sequence -> protein minhash uses ksize / 3, with no reverse complement. (src/core/src/sketch/minhash.rs::add_protein()).
sourmash on DNA sequence -> protein minhash uses ksize / 3, with reverse complement. (src/core/src/sketch/minhash.rs::add_sequence(), else block)

this is an enduring source of confusion, so should be fixed. and there are a complex set of issues in the issue tracker here...

Change --protein k-size meaning #574 has a good discussion of some options. Note also Allow all ksizes for protein signature calculation if input is protein #575.
Document and/or improve protein MinHash API. #720 is another good discussion of the problem
Setting "scaled=1" doesn't retrieve all protein k-mers #701 points out user confusion around this!
with --protein, translate whole sequence and then emit k-mers #664 requests a change so that the translated add_sequence stuff translates the entire read into aa first, and then breaks it down into k-mers. this could help with losing information from the end of reads Pad non-multiples-of-three nucleotide sequences for protein translation and deal with codon degeneracy elegantly #659.
Add option for strand-specific / 3-frame translation for --protein #657 points out that we do not actually deal with RNA very well in translate mode.
add_sequence method for MinHash #186 looks like a potential bug (or at least suboptimal UX) around add_sequence
we'd need to update Document DNA, protein, input_is_protein calculations somewhere #59 so that we have a simple explanation and Python code for this.

The text was updated successfully, but these errors were encountered:

ctb · 2020-05-25T15:50:13Z

note that such changes would trigger a new major version, so ~now-ish is a good time...

bluegenes · 2020-05-25T19:25:59Z

I just went through the issues above and tried to summarize here:

desired functionality:

Calculate protein signatures with exact ksize (not k=k/3), both at python api and on cli. Translated signatures still use k=k/3.
Translated k=33 should be comparable with protein k=11 (equivalent ksizes). can we just use the hash function to flag incompatible signatures, instead of DNA/protein/etc? #751 could facilitate this. In Change --protein k-size meaning #574, @luizirber pointed out that can we just use the hash function to flag incompatible signatures, instead of DNA/protein/etc? #751 is already enabled in the rust code?
Potential translation improvements:
- translate whole sequence, then emit kmers (with --protein, translate whole sequence and then emit k-mers #664)
- pad translated seqs (Pad non-multiples-of-three nucleotide sequences for protein translation and deal with codon degeneracy elegantly #659)
- potentially add 3-frame translation (Add option for strand-specific / 3-frame translation for --protein #657)
Potentially enhance input checks for add_sequence and add_protein? From Document and/or improve protein MinHash API. #720: "in the provided example in Setting "scaled=1" doesn't retrieve all protein k-mers #701, I'm not sure why add_sequence doesn't complain when it's given a bunch of non-ACTG characters."

proposed solutions:

In #574, @ctb proposed splitting dna and protein signature calculation. This could be a good cli solution: each signature method would use exact ksize, except when translating (translate k = k/3).

In line with comments in #720, I propose changing the add_protein code to work directly on the provided ksize, so the functionality would change both at the cli and python api levels. #186 suggests an alternate method that would merge add_sequence and add_protein, but the current implementation (as separate methods) may be clearer and facilitate input checks?

Note that the change in #576 (issue #575) actually removed good error checking for the current working implementation, but will be accurate for the proposed changes.

luizirber · 2020-05-25T20:29:38Z

Translated k=33 should be comparable with protein k=11 (equivalent ksizes). can we just use the hash function to flag incompatible signatures, instead of DNA/protein/etc? #751 could facilitate this. In Change --protein k-size meaning #574, @luizirber pointed out that can we just use the hash function to flag incompatible signatures, instead of DNA/protein/etc? #751 is already enabled in the rust code?

It is enabled in the Rust code (see here), it converts the is_protein/hp/dayhoff booleans into the right hash_function enum.

This is the commit that would change it in Python (expose the HashFunction enum in Python). It is in the rfc/751_rust branch, but can be merged without changing the public Python API.

I think we should expose the HashFunction as a parameter in MinHash.__init__, replacing the
is_protein/hp/dayhoff for 4.0, but that would break all the existing code.
Current __init__:

def __init__(self, n, ksize, is_protein=False, dayhoff=False, hp=False, track_abundance=False, seed=MINHASH_DEFAULT_SEED, max_hash=0, mins=None, scaled=0):

Change to:

def __init__(self, ksize, scaled=1000, hash_function=HashFunctions.murmur64_DNA, track_abundance=False, seed=MINHASH_DEFAULT_SEED, mins=None, *, n=0, is_protein=False, dayhoff=False, hp=False):

Note: these changes also

regular MinHash are not the default (n=0) anymore
force is_protein/hp/dayhoff to be keyword-only arguments
drops max_hash (it already throws a deprecation warning)

We can also potentially move the * and make all the keywords into keyword-only, like this

def __init__(self, ksize, *, scaled=1000, hash_function=HashFunctions.murmur64_DNA, track_abundance=False, seed=MINHASH_DEFAULT_SEED, mins=None, n=0, is_protein=False, dayhoff=False, hp=False):

and so the only required argument would be ksize, and all the other should be used explicitly. More info: https://www.asmeurer.com/python3-presentation/slides.html#26

olgabot · 2020-05-25T22:18:57Z

Thanks @ctb @bluegenes @luizirber for the discussion so far!

Jumping on @bluegenes's comment:

Calculate protein signatures with exact ksize (not k=k/3), both at python api and on cli. Translated signatures still use k=k/3.

Haha, this breaks my workflow :) I do a good amount of comparisons of DNA vs protein, e.g. DNA k=33 and protein k=33 (11). Maybe something like a separate "protein ksize" parameter?

Potential translation improvements:

translate whole sequence, then emit kmers (with --protein, translate whole sequence and then emit k-mers #664)

pad translated seqs (Pad non-multiples-of-three nucleotide sequences for protein translation and deal with codon degeneracy elegantly #659)

potentially add 3-frame translation (Add option for strand-specific / 3-frame translation for --protein #657)

@bluegenes what do you think of the reverse complement protein k-mers? I hadn't thought about that.

I've personally moved completely away from sourmash's translation to sencha because there's no guarantee that the 11-mer protein without stop codons is actually the "right" one. So while it's convenient to have the translation within sourmash I find myself not using it.

Potentially enhance input checks for add_sequence and add_protein? From Document and/or improve protein MinHash API. #720: "in the provided example in Setting "scaled=1" doesn't retrieve all protein k-mers #701, I'm not sure why add_sequence doesn't complain when it's given a bunch of non-ACTG characters."

Yes, some user feedback on "I saw some letters I didn't like" would be helpful.

proposed solutions:

In #574, @ctb proposed splitting dna and protein signature calculation. This could be a good cli solution: each signature method would use exact ksize, except when translating (translate k = k/3).

I think this makes the most sense. Handling ksize and molecule/alphabet is a unique use case for proteins (though I've seen reduced nucleotide alphabets like Weak/Strong, Amino/Keto, Purine/Pyrimidine be useful for e.g. noncoding RNA homology...), I think it's worth separating to avoid confusion.

In line with comments in #720, I propose changing the add_protein code to work directly on the provided ksize, so the functionality would change both at the cli and python api levels. #186 suggests an alternate method that would merge add_sequence and add_protein, but the current implementation (as separate methods) may be clearer and facilitate input checks?

Yes to this!

FWIW, my current main workflow is as in nf-core/kmermaid:

Preprocess reads with trimming, FASTQC, etc
Translate RNA-seq reads to protein-coding sequences with sencha translate, creating coding_nucleotides.fasta and coding_peptides.fasta files for each sequence
Generate nucleotide signatures of the coding nucleotides with sourmash sketch
Generate protein signatures of the coding peptides with sourmash sketch, at the corresponding ksizes the nucleotides.
(optional sourmash compare that I lately haven't been doing because I need to compare multiple runs together later)

Hope that helps! Thanks for bringing these issues up.

ctb · 2020-05-26T13:22:42Z

hi @olgabot thanks! I'm confused by two things --

Haha, this breaks my workflow :) I do a good amount of comparisons of DNA vs protein, e.g. DNA k=33 and protein k=33 (11).

I don't understand what kinds of computation or comparison you're doing here - are you doing translations of DNA sequence into aa at DNA ksize=33/aa ksize 11, as well as protein sequence at ksize=33 (aa ksize 11), and then comparing them?

separately, "this breaks my workflow" - is that a negative, or a neutral? You'd be able to pin sourmash to > 2 and < 4 and still use it as normal until you migrated your workflows to sourmash 4 explicitly.

olgabot · 2020-05-26T17:19:23Z

hi @olgabot thanks! I'm confused by two things --

Haha, this breaks my workflow :) I do a good amount of comparisons of DNA vs protein, e.g. DNA k=33 and protein k=33 (11).

I don't understand what kinds of computation or comparison you're doing here - are you doing translations of DNA sequence into aa at DNA ksize=33/aa ksize 11, as well as protein sequence at ksize=33 (aa ksize 11), and then comparing them?

The comparison is done on the Jaccard similarities after sourmash compare, i.e. how similar are these two transcriptomes if you use DNA or protein sequences, and if protein, what if you reencode to dayhoff or HP? How do the similarities increase/decrease? How does this track with evolutionary distance of the transcriptomes? These are the kinds of questions I've been thinking about :)

separately, "this breaks my workflow" - is that a negative, or a neutral? You'd be able to pin sourmash to > 2 and < 4 and still use it as normal until you migrated your workflows to sourmash 4 explicitly.

Ah it was supposed to be a joking neutral. I can update the pipeline code, it already treats nucleotide vs peptide sequences differently so it's not a big deal 👍

ctb · 2020-05-30T14:13:48Z

OK, I did some playing around this morning and have a better understanding of just how terrible all of this is 😂

sourmash protein stuff

starting from

GCA_001593925 - d__Archaea,p__Crenarchaeota,c__Bathyarchaeia,o__B26-1,f__B26-1,g__B26-1,s__B26-1 sp001593925

GCA_001593925.1_ASM159392v1_cds_from_genomic.fna.gz
GCA_001593925.1_ASM159392v1_genomic.fna.gz
GCA_001593925.1_ASM159392v1_protein.faa.gz

and

GCA_001593935 - d__Archaea,p__Crenarchaeota,c__Bathyarchaeia,o__B26-1,f__B26-1,g__B26-1,s__B26-1 sp001593935

GCA_001593935.1_ASM159393v1_cds_from_genomic.fna.gz
GCA_001593935.1_ASM159393v1_genomic.fna.gz
GCA_001593935.1_ASM159393v1_protein.faa.gz

notes --

at k=33 sourmash compare *v1_genomic.fna.gz.sig shows no similarity
sourmash compute -k 33 *.faa.gz should not have worked without error or warning! (computes DNA signatures!)
sourmash compute -k 33 *.faa.gz --protein --no-dna works but does the wrong thing - assumes input sequences are DNA!
sourmash compute -k 11,33 --input-is-protein *.faa.gz - which ksize do we end up with here?
sourmash compute -k 33 --protein *_cds_from_genomic.fna.gz works but does 6 frame translation. (and how would sourmash know the right frame?)

ctb · 2020-06-18T16:00:03Z

yeah, continued terribleness when building lots of protein signatures. We need better error checking and UX improvements around "how could you possibly have meant to do this?"

We may want to change some of the default parameters when doing protein calculations. This kind of actually argues for a different command line entry point for protein computations, e.g. something other than compute. 🤔 New thought... will sit on it to see where I come out.

bluegenes · 2020-06-18T17:25:55Z

some additional thoughts:

protein, dayhoff, hp signatures work well at different ranges of ksizes. If you want to calculate all of them together, you end up with a series of useless signatures (e.g. want protein k=21, but hp at k=21 is relatively useless)
If instead you calculate them separately, you need different files. For me this ends up practically with one file per ksize per alphabet. Not a big deal, but lots of files.

Potential solutions:

easy: signature cat utility that allows you to cat sigs into a single file after creating them separately (if they have identical file metadata)
harder, perhaps confusing at cli level: enable diff ksizes per alphabet

olgabot · 2020-06-18T17:45:30Z

some additional thoughts:

protein, dayhoff, hp signatures work well at different ranges of ksizes. If you want to calculate all of them together, you end up with a series of useless signatures (e.g. want protein k=21, but hp at k=21 is relatively useless)

If instead you calculate them separately, you need different files. For me this ends up practically with one file per ksize per alphabet. Not a big deal, but lots of files.

Potential solutions:

easy: signature cat utility that allows you to cat sigs into a single file after creating them separately (if they have identical file metadata)

Does sourmash sig merge solve this?

harder, perhaps confusing at cli level: enable diff ksizes per alphabet

YES YES YES to all of this. I also end up generating 1 minhash/.sig file because of the alphabet-vs-ksize usefulness. There's some log math one can do to get the equivalent ksize for a different alphabet of size sigma, relative to the protein ksize k

sigma ^ k_new = 20 ^ k
k_new * log(sigma) = k * log(20)
k_new = k * log(20) / log(sigma)

Here's an implementation:

import math

PROTEIN_SIGMA = 20

def get_equivalent_protein_ksize(protein_ksize, sigma):
    return int(math.ceil(protein_ksize * math.log(PROTEIN_SIGMA) / math.log(sigma)))

get_equivalent_protein_ksize(protein_ksize=5, sigma=6)

bluegenes · 2020-06-18T18:07:55Z

Does sourmash sig merge solve this?

Unless I'm mistaken, merge doesn't solve bc it's designed to output the union of all the hashes in file1.sig and file2.sig to merged.sig

Though I guess I don't know what the behavior is if merging files with different ksizes (that thus have no union)...

ctb · 2020-06-18T20:57:00Z

Random thought - what about a syntax like:

sourmash compute --dayhoff k=19,scaled=200 k=21,scaled=200 --protein k=21,scaled=200

etc?

olgabot · 2020-06-19T02:05:45Z

I like it!

ctb · 2020-06-21T15:25:18Z

after working through logic for #1037, I think there are multiple opportunities for checks on alphabet (DNA vs prot) at the API level.

interim thoughts:

I think we do need both command line and API level checking. The command line can make use of additional info (filename, aggregated across sequences, etc) while the API has to do the trickier job of working with only the sequence it's given.
I am leaning towards add_dna_sequence and add_protein_sequence at the API level;
it's not 100% clear how robust it will be to check that any given k-mer is DNA vs prot;
one strategy might be to look at what fraction of k-mers are valid alphabet;
I'm strangely challenged by the notion of separating compute-with-input-is-DNA and compute-with-input-is-protein at the command line.
- we may need a third, right? input-is-RNA? that would do translation? but we don't really know how to find the right reading frame just yet, so that should probably be punted until sencha and other approaches are fully validated.
- I know olga has proposed changing compute to sketch in rename 'compute' to 'sketch' or compare to similarity? #663, I don't really know how to evaluate that, but it COULD be part of this change, e.g. sourmash sketch --dna and sourmash sketch --protein or something else terrible like that :). maybe dnasketch and aasketch/protsketch? to be later followed by rnasketch?
- and/or we could limit compute to DNA in 4.0, and add a new separate command for the current --input-is-protein behavior, like aasketch or protsketch

ctb · 2020-06-21T16:51:56Z

more thoughts --

I think we can proceed slowly - maybe even in another 3.x release - by adding a new top-level aasketch or whatever command. This would prototype new command line behavior, e.g. sourmash aasketch --dayhoff k=19,scaled=200 k=21,scaled=200, while preserving old behavior, and would let us play a little bit with these suggestions before locking us into a new release. Could do same with dnasketch or similar command.

luizirber · 2020-06-21T17:42:46Z

sourmash compute --dayhoff k=19,scaled=200 k=21,scaled=200 --protein k=21,scaled=200

I'm a bit worried this will get quite hard to parse...
A pro is that we have all the info beforehand, so it is also possible to explore more parallelism. In this example there are 3 distinct sketches being calculated, and they can all the done in parallel.

Another idea: unique-kmers.py from khmer has a passthrough option (-S, --stream-records) for using it as a filter in the middle of a pipeline. If there was a flag like that for sourmash compute it would also be possible to build many sketches by varying parameters:

sourmash compute --dayhoff -k 19,21 --scaled=200 -S -o dayhoff.sig | \
sourmash compute --protein -k=21 --scaled=200 -S -o protein.sig | \
sourmash compute --dna -k=21 --scaled=1000 -o dna.sig

In this case parallelism is achieved by piping more processes. Drawback is that it is longer and more repetitive, and depends on shell piping.

ctb · 2020-06-23T22:52:21Z

hmm. I really like having this diversity of options, and I kinda like all of them, together with a 'cat' style command for sourmash sig to concatenate individual sig files.

re

I'm a bit worried this will get quite hard to parse...

I'm more concerned about the UX! I want to make simple things simple, and hard things possible...

ctb · 2020-07-03T15:20:02Z

musing out loud about serialization - if we change the meaning of ksize in protein signatures, how do we deal with loading existing signature files (where the ksize is 3x what it will be)? our options are to change molecule type (to protein2, or something - ugly!), or to change the sourmash signature version, I think. Changing the version seems better to me. That would also allow us to upgrade signatures with sig cat, etc. etc.

separate thought - if we have two different commands, dnasketch and aasketch, where does the translate-this-dna-sequence command go? in dnasketch because we're taking in DNA sequences, or in aasketch because we're producing protein signatures?

olgabot · 2020-07-06T16:00:31Z

To be consistent with previous versions, can the ksize variable be renamed, e.g. protein_ksize or pksize ? Where protein_ksize = ksize/3?

ctb · 2020-07-07T15:03:41Z

To be consistent with previous versions, can the ksize variable be renamed, e.g. protein_ksize or pksize ? Where protein_ksize = ksize/3?

hot takes --

sounds like a lot of confusion, special casing, and support burden to add to sourmash internals!
I think only you and tessa are using protein k-mers...?
so IMO it's much, much easier to keep backwards compatibility in the signature loading code, and move forward with changing the meaning of ksize for protein/dayhoff/hp in sourmash 4.0.

ctb · 2020-08-10T14:41:16Z

@olgabot @bluegenes @luizirber ok, in #1159 I've added (what appears to be) a fully functional sourmash sketch set of commands.

sourmash sketch dna|rna <sketch params> [ files ]

sourmash sketch aa|prot|protein <sketch params> [ files]

sourmash sketch translate <sketch params> [ files ]

See the description there for (lots) more details! I think it's a much less confusing user experience, but I welcome your feedback, pro or con!

Note that I did implement better ksize handling (for protein and translate, it's now the "true" amino acid k-mer size) but I didn't do anything magic with ksize in the internals, so if you construct a signature with sketch protein -k 11 you end up with a signature file that has a ksize of 33. That can be fixed later - my main question is whether this set of changes makes sense.

ctb · 2020-08-14T16:30:37Z

here are some notes I took while working through #1159.

Changes to signature computation

Suggested changes to signature computation:

translate whole read per with --protein, translate whole sequence and then emit k-mers #664
support top strand/bottom strand only for translate RNA input Add option for strand-specific / 3-frame translation for --protein #657

Signature JSON format changes

What signature format changes should we do in tandem? see #268 for rollup issue

ksize change for protein Change --protein k-size meaning #574
move hash function into minhash section?
rename 'signatures' list to 'minhashes'?
support actual command-line concatenation of signatures? investigate modifications to JSON parsing to allow cat to work directly #1093
could add bp and input file list per add number of bp & sequences => signature file #246, Add documentation to sourmash signature describe for source file origin #769
ould also add information about preprocessing per Provide a standard format / provenance info for commands used in data preprocessing #269 on a per-command basis; OR, just add information e.g. input type was genome, input type was reads, etc. based on command executed.

Changes to MinHash API

general issue here, #338
more here, #720
and more here, #885, although that is mostly about docs and tests now.

consider lifting things out to signature, too, per Better conceptual definition on minhash/signature/query #616
switch to using moltype enums in creation update MinHash attributes/properties for consistency, and/or switch to using moltype attribute #1136
add_sequence changes per add_sequence method for MinHash #186? add_sequence(<str>, moltype=...)
- moltype should be enum
- moltype could be assumed to be DNA for DNA minhashes, aa|prot|protein for protein minhashse.
- could have it be dna|rna_top|rna_bottom for protein minhashes to translate incoming?
do we want to make different MinHash classes for different moltypes?
- protein MinHashes are more complicated than DNA..
- the default ksize/scaled/etc could mirror the CLI defaults

ctb · 2020-08-14T18:42:46Z

also a question to ask re #1159, how easy/hard does this make snakemake recipes?

bluegenes · 2020-08-31T22:04:06Z

I've been using the new sketching a fair bit now - going well so far.

Re: snakemake: If you're generating a single type of signature, it's not hard. It gets more complex if you want to enable flexibility for sketching with dna, translate, or protein (just requires multiple rules or a function to generate the params). Generating a single sketch file can save on downstream snakemake complexity, so when I want both nucleotide and translated signatures, I've taken to sketching each and then using sig cat to combine them into a single signature.

On ksizes: The different ksizes for sketching vs internal params is a bit frustrating (still need to pass k*3 when selecting sigs) - consider this my vote in for changing that for 4.0. Thoughts on ksize, @olgabot @ctb ? Or should I move this discussion to #1169 ?

ctb · 2020-08-31T22:46:04Z

I've been using the new sketching a fair bit now - going well so far.

great!

Re: snakemake: If you're generating a single type of signature, it's not hard. It gets more complex if you want to enable flexibility for sketching with dna, translate, or protein (just requires multiple rules or a function to generate the params). Generating a single sketch file can save on downstream snakemake complexity, so when I want both nucleotide and translated signatures, I've taken to sketching each and then using sig cat to combine them into a single signature.

hmm, interesting. I wonder if there's a way to make this easy in a single go... will think on't.

On ksizes: The different ksizes for sketching vs internal params is a bit frustrating (still need to pass k*3 when selecting sigs) - consider this my vote in for changing that for 4.0. Thoughts on ksize, @olgabot @ctb ? Or should I move this discussion to #1169 ?

just thinking out loud - it'd be easy to fix the CLI and the Python API, I think, without changing the saving/loading. Maybe I'll try doing that as a first step.

ctb · 2021-03-04T13:55:48Z

The CLI and Python API were updated to use new protein ksizes in #1315.

This issue needs a serious revisit to glean the remaining ideas, perhaps to be put into new issues!

ctb · 2021-05-15T13:26:45Z

replaced by #1525

This was referenced Jun 19, 2020

What defaults do we want? #219

Closed

MinHash initialization #338

Open

add DNA/protein check on first 100 bp of FASTA files #1037

Open

ctb mentioned this issue Jun 24, 2020

[MRG] add signature cat and signature split commands to combine/split signature files #1044

Merged

11 tasks

ctb mentioned this issue Jul 8, 2020

how should we adjust output when using 6-frame-translated signatures? #1087

Open

ctb mentioned this issue Jul 25, 2020

[MRG] MinHash class refactoring #1128

Merged

19 tasks

ctb mentioned this issue Aug 9, 2020

[MRG] Explore addition of a sketch command. #1159

Merged

5 tasks

ctb mentioned this issue Aug 14, 2020

potential next steps for the sourmash sketch command #1169

Open

ctb mentioned this issue Jan 10, 2021

adjust protein k-mer sizes for sourmash sketch? #1271

Closed

ctb added the revisit_me An issue that needs attention and clarification label Mar 4, 2021

This was referenced May 15, 2021

Document and/or improve protein MinHash API. #720

Open

[EXP] explore alternative MinHash constructors, and other things. #1510

Open

summary: further improvements to protein handling in sourmash #1525

Open

ctb closed this as completed May 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new behavior for protein k-mer size calculations - gathering the issues together. #999

new behavior for protein k-mer size calculations - gathering the issues together. #999

ctb commented May 25, 2020 •

edited

Loading

ctb commented May 25, 2020

bluegenes commented May 25, 2020 •

edited

Loading

luizirber commented May 25, 2020

olgabot commented May 25, 2020 •

edited

Loading

ctb commented May 26, 2020

olgabot commented May 26, 2020

ctb commented May 30, 2020

ctb commented Jun 18, 2020

bluegenes commented Jun 18, 2020 •

edited

Loading

olgabot commented Jun 18, 2020 •

edited

Loading

bluegenes commented Jun 18, 2020

ctb commented Jun 18, 2020

olgabot commented Jun 19, 2020

ctb commented Jun 21, 2020 •

edited

Loading

ctb commented Jun 21, 2020

luizirber commented Jun 21, 2020 •

edited

Loading

ctb commented Jun 23, 2020

ctb commented Jul 3, 2020 •

edited

Loading

olgabot commented Jul 6, 2020

ctb commented Jul 7, 2020

ctb commented Aug 10, 2020

ctb commented Aug 14, 2020

ctb commented Aug 14, 2020

bluegenes commented Aug 31, 2020

ctb commented Aug 31, 2020

ctb commented Mar 4, 2021

ctb commented May 15, 2021

new behavior for protein k-mer size calculations - gathering the issues together. #999

new behavior for protein k-mer size calculations - gathering the issues together. #999

Comments

ctb commented May 25, 2020 • edited Loading

ctb commented May 25, 2020

bluegenes commented May 25, 2020 • edited Loading

luizirber commented May 25, 2020

olgabot commented May 25, 2020 • edited Loading

ctb commented May 26, 2020

olgabot commented May 26, 2020

ctb commented May 30, 2020

sourmash protein stuff

ctb commented Jun 18, 2020

bluegenes commented Jun 18, 2020 • edited Loading

olgabot commented Jun 18, 2020 • edited Loading

bluegenes commented Jun 18, 2020

ctb commented Jun 18, 2020

olgabot commented Jun 19, 2020

ctb commented Jun 21, 2020 • edited Loading

ctb commented Jun 21, 2020

luizirber commented Jun 21, 2020 • edited Loading

ctb commented Jun 23, 2020

ctb commented Jul 3, 2020 • edited Loading

olgabot commented Jul 6, 2020

ctb commented Jul 7, 2020

ctb commented Aug 10, 2020

ctb commented Aug 14, 2020

Changes to signature computation

Signature JSON format changes

Changes to MinHash API

ctb commented Aug 14, 2020

bluegenes commented Aug 31, 2020

ctb commented Aug 31, 2020

ctb commented Mar 4, 2021

ctb commented May 15, 2021

ctb commented May 25, 2020 •

edited

Loading

bluegenes commented May 25, 2020 •

edited

Loading

olgabot commented May 25, 2020 •

edited

Loading

bluegenes commented Jun 18, 2020 •

edited

Loading

olgabot commented Jun 18, 2020 •

edited

Loading

ctb commented Jun 21, 2020 •

edited

Loading

luizirber commented Jun 21, 2020 •

edited

Loading

ctb commented Jul 3, 2020 •

edited

Loading