Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding transformations.py #46

Closed
wants to merge 0 commits into from

Conversation

kyrillosishak
Copy link
Contributor

No description provided.

Comment on lines 290 to 291
expected_script = (
'process cp_0 {\n'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use triple-quote dedented string, so the result is copy/pastable

Suggested change
expected_script = (
'process cp_0 {\n'
expected_script = '''
process cp_0 {
...
'''

for predecessor in dataflow_graph.predecessors(node):
if isinstance(predecessor, FileNode):
inputs.append(predecessor.fileName)
input_vars.append(predecessor.fileName.replace(".", "_"))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose a command inputs A.txt and A_txt. Both of these would get transformed to the same name causing a duplicate. Also, I think that there are more special characters we would need to avoid. Define another function with signature escape_filename_for_nextflow(filename: str) -> str. We should try encoding the ith reserved character to _i, including underscore itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve handled .,-, and _ so far. Are there any other special characters I should consider?

Copy link
Owner

@charmoniumQ charmoniumQ Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that those Nextflow input-names have to be valid Groovy identifiers. Please correct me if this is wrong. In that case, the Groovy documentation says:

Identifiers start with a letter, a dollar or an underscore... Then following characters can contain letters and numbers.

I found the set of disallowed characters in a Linux path in this Stack Overflow post, but I think that it is truly defined in the POSIX standard. Any character other than null-byte and / are allowed. Lest we think "surely no application would use __", consider that Vim, Emacs, Nano use ~ in the filename of backups, some web crawlers leave ?, &, :, and % in the filenames based on URLs that get downloaded, scientific programs may use all-number filenames, non-English speakers may use their language's special characters in their filenames, non-technical users use (, ), and space in their filenames all the time.

So what other special characters to consider? All 256 ASCII characters that are not letters or numbers (since Groovy can understand those in an identifier) and null bytes or slash (since those are guaranteed not to appear in a filename). That's why I suggested the scheme of replacing any non-letter non-number by _i where i is the ASCII character code in hex. Then, we also have to escape _, since we are using it as a special character. And as I learned today, if the path begins with a number, we should prepend an escape code, since Groovy IDs can't start with a number.


## Installation
### Prerequisites
Java 8 or later must be installed on your system.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use flake.nix to install Nexflow instead. This will ensure a consistent version of Nextflow and JRE as well. Add pkgs.nextflow to flake.nix and run nix develop.

if char.isalnum(): # Keep letters and numbers unchanged
escaped_filename.append(char)
elif char == '_': # Escape underscores
continue
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will erase the underscores. Rather than continue, just append _{ord(char):02x}

@charmoniumQ
Copy link
Owner

charmoniumQ commented Aug 14, 2024

Remove main.nf if it is generated.

Rename transformations.py to workflows.py.

Make sure that the workflow test is exercised when we type just test-dev, which it should be, so long as it follows Pytest's convention for naming/finding tests (which I don't know exactly).

Make a CLI front-end for it, like probe export --type nextflow [PROBE_LOG]. The front-end should follow the same steps as the CLI front-end in #35, except it does not export a graphviz pydot string. It will send the graph here (depending on the --type.

This PR should not, however, be dependent on #35. If #35 is not ready when this PR is, the function that turns a hb_graph into a dataflow_graph can be "mocked" to return the graph I supplied you here (f(x) returns constant C for all x) just for now, and a future PR will "hook them up"; if #35 is merged before this is ready, we should obviously use the "real" hb_graph -> dataflow_graph function.

The front-end CLI need not be tested, at this juncture; the "glass box" test you already have (testing the underlying functionality rather than the end-to-end/black-box functionality) is acceptable at this point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants