-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding transformations.py #46
Conversation
expected_script = ( | ||
'process cp_0 {\n' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use triple-quote dedented string, so the result is copy/pastable
expected_script = ( | |
'process cp_0 {\n' | |
expected_script = ''' | |
process cp_0 { | |
... | |
''' |
for predecessor in dataflow_graph.predecessors(node): | ||
if isinstance(predecessor, FileNode): | ||
inputs.append(predecessor.fileName) | ||
input_vars.append(predecessor.fileName.replace(".", "_")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppose a command inputs A.txt
and A_txt
. Both of these would get transformed to the same name causing a duplicate. Also, I think that there are more special characters we would need to avoid. Define another function with signature escape_filename_for_nextflow(filename: str) -> str
. We should try encoding the ith reserved character to _i
, including underscore itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve handled .
,-
, and _
so far. Are there any other special characters I should consider?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that those Nextflow input-names have to be valid Groovy identifiers. Please correct me if this is wrong. In that case, the Groovy documentation says:
Identifiers start with a letter, a dollar or an underscore... Then following characters can contain letters and numbers.
I found the set of disallowed characters in a Linux path in this Stack Overflow post, but I think that it is truly defined in the POSIX standard. Any character other than null-byte and /
are allowed. Lest we think "surely no application would use __", consider that Vim, Emacs, Nano use ~
in the filename of backups, some web crawlers leave ?
, &
, :
, and %
in the filenames based on URLs that get downloaded, scientific programs may use all-number filenames, non-English speakers may use their language's special characters in their filenames, non-technical users use (
, )
, and space
in their filenames all the time.
So what other special characters to consider? All 256 ASCII characters that are not letters or numbers (since Groovy can understand those in an identifier) and null bytes or slash (since those are guaranteed not to appear in a filename). That's why I suggested the scheme of replacing any non-letter non-number by _i
where i
is the ASCII character code in hex. Then, we also have to escape _
, since we are using it as a special character. And as I learned today, if the path begins with a number, we should prepend an escape code, since Groovy IDs can't start with a number.
|
||
## Installation | ||
### Prerequisites | ||
Java 8 or later must be installed on your system. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use flake.nix to install Nexflow instead. This will ensure a consistent version of Nextflow and JRE as well. Add pkgs.nextflow
to flake.nix
and run nix develop
.
if char.isalnum(): # Keep letters and numbers unchanged | ||
escaped_filename.append(char) | ||
elif char == '_': # Escape underscores | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will erase the underscores. Rather than continue
, just append _{ord(char):02x}
Remove Rename Make sure that the workflow test is exercised when we type Make a CLI front-end for it, like This PR should not, however, be dependent on #35. If #35 is not ready when this PR is, the function that turns a The front-end CLI need not be tested, at this juncture; the "glass box" test you already have (testing the underlying functionality rather than the end-to-end/black-box functionality) is acceptable at this point. |
No description provided.