Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI: Better subcommand structure #78

Open
karel-brinda opened this issue Oct 18, 2024 · 5 comments
Open

CLI: Better subcommand structure #78

karel-brinda opened this issue Oct 18, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@karel-brinda
Copy link
Collaborator

karel-brinda commented Oct 18, 2024

I'm testing the latest version, and it's increasingly more obvious to me that we'll need to restructure the CLI. Currently it's confusing, and likely not permanent.

Specifically, we need a good structure of well separated subcommands. Also, might be useful to have a special MS-specicific file suffix, eg. .msfa.

The use should be more simple, eg:

kmercamel ms -k 31 genome.fa > genome.msfa
kmercamel optimize genome.msfa > genome_maskopt.msfa
kmercamel reformat -m mask.txt -s superstring.txt genome_maskopt.msfa
kmercamel reformat -P mask.txt -S superstring.txt > glued.msfa

Notes:

  • -c should be on by default; in fact there're 3 possible modes of work: canonical kmers, forward kmers, reverse kmers
  • for optimization, -k should be parsed automatically from the superstring (by default) and this extraction should appear in a well visible message (e.g.,"KMER SIZE EXTRACTED: 31")

What do you think @OndrejSladky @PavelVesely ?

@karel-brinda karel-brinda changed the title Better CLI CLI: Better subcommand structure Oct 18, 2024
@karel-brinda karel-brinda added the enhancement New feature or request label Oct 18, 2024
@karel-brinda
Copy link
Collaborator Author

Ok, immediately even after writing this ticket, I completely forgot the -c param to the command I was running, which likely made the computation much slower. This really needs to be fixed :) (I guess ~90% users forgets this as well.)

@PavelVesely
Copy link
Collaborator

One more thing: it'd be great to compute the MS and optimize the mask by one command --- we would avoid storing & loading non-optimized MS from the disk, and it's simpler to measure time and memory requirements of both steps if executed by a single command

@karel-brinda
Copy link
Collaborator Author

We actually discussed this at the prev meeting. This was the issue: while for max 1 it's simple, there're many things that can go wrong for min int, and there's risk the whole MS computation can be lost due to an error in the final optimization part.

@karel-brinda
Copy link
Collaborator Author

Maybe a solution could be, once we have the MS command, to implement default / greedy zero / max ones as a param, and allow min int only in the reoptimalizaton subcommand?

@PavelVesely
Copy link
Collaborator

We actually discussed this at the prev meeting. This was the issue: while for max 1 it's simple, there're many things that can go wrong for min int, and there's risk the whole MS computation can be lost due to an error in the final optimization part.

Makes sense -- optimizing the number of runs can fail due to large memory consumption or would just run for a very long time (say, a few days even if default MS is computed in hours)

Maybe a solution could be, once we have the MS command, to implement default / greedy zero / max ones as a param, and allow min int only in the reoptimalizaton subcommand?

This would still be useful, but it's not critical

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants