Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: develop a crate to parse cwl documents #4

Merged
merged 21 commits into from
Jan 5, 2025

Conversation

nsyzrantsev
Copy link
Contributor

@nsyzrantsev nsyzrantsev commented Jan 3, 2025

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced a new CWL (Common Workflow Language) parser library for parsing and handling CWL documents.
    • Added support for parsing Command Line Tool and Workflow schemas.
    • Implemented serialization and deserialization of CWL documents.
    • Added new CWL tool definitions and workflow schema capabilities.
    • Introduced new modules for enhanced organization within the parser library.
    • Added a new package for the zefiro-cwl-parser with version 0.1.0.
    • Defined new input and output parameters for CWL tools and workflows.
  • Documentation

    • Added comprehensive README for the new CWL parser library.
    • Included example CWL schema and values documents.
  • Chores

    • Updated GitHub Actions workflow configurations, including the reintroduction of the publishing job.
    • Added new package to workspace configuration.
    • Removed dependencies from core package.
    • Modified CI configuration for cargo check commands.

@nsyzrantsev nsyzrantsev self-assigned this Jan 3, 2025
@nsyzrantsev nsyzrantsev linked an issue Jan 3, 2025 that may be closed by this pull request
Copy link

coderabbitai bot commented Jan 3, 2025

Walkthrough

This pull request introduces enhancements to the zefiro-cwl-parser library, designed for parsing and handling Common Workflow Language (CWL) documents. Key changes include the reintroduction of the publish-crates-io job in the GitHub Actions workflow, the removal of the dependencies section from the Cargo.toml of the zefiro-core package, and the addition of a new Cargo.toml file for the zefiro-cwl-parser package. The library now supports comprehensive CWL schema and value parsing with improved organization and modularity.

Changes

File Change Summary
.github/workflows/cd.yml Reintroduced publish-crates-io job for package publishing; minor update to generate-changelog job command.
zefiro-core/Cargo.toml Removed dependencies section; retained rust-version in [package].
Cargo.toml Added zefiro-cwl-parser as a workspace member.
zefiro-core/zefiro-cwl-parser/Cargo.toml New package with metadata and dependencies defined.
zefiro-core/zefiro-cwl-parser/README.md Added documentation for the new library.
zefiro-core/zefiro-cwl-parser/examples/data/*.yml Introduced various CWL schema and values examples, including clt-step-schema.yml, clt-step-values.yml, and wf-step-schema.yml.
zefiro-core/zefiro-cwl-parser/src/lib.rs Added modules schema and values; exported CwlSchema and CwlValues.
zefiro-core/zefiro-cwl-parser/src/schema/*.rs Defined schemas for CWL CommandLineTool and Workflow, including input/output parameters and requirements.
zefiro-core/zefiro-cwl-parser/src/values/*.rs Introduced CwlValues and types for CWL file representations, including File, Directory, and CwlValueType.
.github/workflows/ci.yml Updated commands in the CI configuration for the "Check" job.

Possibly related issues

  • Develop a crate to parse CWL documents #3: This PR addresses the development of a crate to parse CWL documents, aligning with the changes made in the zefiro-cwl-parser library for parsing CWL schema and values.

Possibly related PRs

Poem

🐰 A Workflow Parser, Swift and Bright
Parsing CWL with Rustian might
Schemas and values, now so clear
Common Workflows, no longer a fear!
CodeRabbit's magic, taking flight! 🚀


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1fc7002 and 1d41817.

📒 Files selected for processing (1)
  • README.md (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • README.md

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

codecov bot commented Jan 3, 2025

Codecov Report

Attention: Patch coverage is 0% with 4 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...firo-core/zefiro-cwl-parser/src/schema/document.rs 0.00% 2 Missing ⚠️
...firo-core/zefiro-cwl-parser/src/values/document.rs 0.00% 2 Missing ⚠️
Flag Coverage Δ
unit-tests 16.66% <0.00%> (-33.34%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...firo-core/zefiro-cwl-parser/src/schema/document.rs 0.00% <0.00%> (ø)
...firo-core/zefiro-cwl-parser/src/values/document.rs 0.00% <0.00%> (ø)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🧹 Nitpick comments (14)
zefiro-core/zefiro-cwl-parser/src/schema/command_line_tool.rs (1)

1-2: Fix import ordering to satisfy formatting constraints.

The pipeline indicates a formatting issue with the import order. Ensure the imports are arranged according to your project's style guidelines, e.g., grouped by external vs. internal crates or alphabetically sorted.

You might apply a diff similar to the following (exact ordering may vary based on your style guide):

-use crate::schema::types::{Any, CwlSchemaType, Documentation};
-use crate::schema::requirements::CommandLineToolRequirement;
+use crate::schema::requirements::CommandLineToolRequirement;
+use crate::schema::types::{Any, CwlSchemaType, Documentation};
🧰 Tools
🪛 GitHub Actions: Continuous Integration

[warning] 1-1: Formatting issue: Import order needs to be reorganized

zefiro-core/zefiro-cwl-parser/src/schema/document.rs (1)

102-103: Add test coverage for to_yaml method.

The lines for writing YAML (to_yaml) are not tested according to coverage reports. Consider implementing a quick test that verifies correct serialization to an in-memory buffer.

Would you like a quick example test snippet to ensure coverage?

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 102-103: zefiro-core/zefiro-cwl-parser/src/schema/document.rs#L102-L103
Added lines #L102 - L103 were not covered by tests

zefiro-core/zefiro-cwl-parser/src/lib.rs (1)

3-3: Add an extra newline to address formatting requirements.

According to the pipeline, an additional newline is required after the module declarations to maintain consistent styling. For example:

 pub mod schema;
 pub mod values;

-pub use crate::schema::document::CwlSchema;
+ 
+pub use crate::schema::document::CwlSchema;
 pub use crate::values::document::CwlValues;
🧰 Tools
🪛 GitHub Actions: Continuous Integration

[warning] 3-3: Formatting issue: Extra newline needed after imports

zefiro-core/zefiro-cwl-parser/src/values/types.rs (2)

6-9: Consider renaming the struct or importing with an alias to avoid confusion with std::fs::File.

While this File struct works as intended, its name might be confused with Rust’s standard library File. You can either rename it or import it with an alias to prevent ambiguity.


17-23: Same naming consideration for Directory.

Similar to the File struct, consider naming this CwlDirectory or another descriptive name to avoid overshadowing possible Rust standard library usage.

zefiro-core/zefiro-cwl-parser/src/schema/types.rs (1)

12-39: Clarity check for CwlSchemaType.

Overall, the enum covers various CWL schema types well. However, the Any(String) variant name could conflict with the Any enum. Consider renaming for improved clarity.

zefiro-core/zefiro-cwl-parser/src/values/document.rs (1)

10-14: Fix the missing trailing comma to comply with Rust style guidelines.

The pipeline indicates a formatting issue. Adding a trailing comma after the last struct field helps maintain consistent formatting.

 pub struct CwlValues {
     #[serde(flatten)]
-    values: HashMap<String, CwlValueType>
+    values: HashMap<String, CwlValueType>,
 }
🧰 Tools
🪛 GitHub Actions: Continuous Integration

[warning] 10-10: Formatting issue: Missing comma in struct field

zefiro-core/zefiro-cwl-parser/examples/data/clt-step-schema.yml (3)

21-22: Review the output location modification approach

The outputEval expression modifies the location property in-place:

outputEval: ${self[0].location += inputs.output_location_subdir; return self[0]}

This mutation-based approach could lead to unexpected side effects. Consider creating a new location instead:

outputEval: ${return Object.assign({}, self[0], {location: self[0].location + inputs.output_location_subdir})}

26-29: Document resource requirements

The resource requirements would benefit from documentation explaining why these specific values were chosen:

  • coresMin: 2
  • outdirMin: 1000 (MB)
  • ramMin: 1024 (MB)

Consider adding comments to explain the resource calculations and any empirical data supporting these values.


31-32: Improve readability of time limit expression

The time limit expression $(60*60*1) (3600 seconds) could be more readable.

Consider using a more explicit expression or adding a comment:

timelimit: $(60*60*1)  # 1 hour in seconds
# or
timelimit: 3600  # 1 hour in seconds
.github/workflows/cd.yml (1)

46-46: Add newline at end of file

Add a newline character at the end of the file to comply with POSIX standards.

       #     cargo publish --manifest-path zefiro-core/Cargo.toml \
-      #       --locked --token ${{ secrets.CARGO_REGISTRY_TOKEN }}
+      #       --locked --token ${{ secrets.CARGO_REGISTRY_TOKEN }}
+
🧰 Tools
🪛 yamllint (1.35.1)

[error] 46-46: no new line character at the end of file

(new-line-at-end-of-file)

zefiro-core/zefiro-cwl-parser/examples/data/wf-step-schema.yml (1)

5-10: Consider using semantic names for workflow inputs

The input IDs are prefixed with step__, which couples them tightly to the step name. Consider using more semantic names that reflect their purpose rather than their usage location.

-  - id: step__in_file
+  - id: input_files
     type:
       type: array
       items: File
-  - id: step__out_file
+  - id: output_filename
     type: string
zefiro-core/zefiro-cwl-parser/README.md (2)

7-8: Enhance specification support documentation

The overview is vague about which CWL v1.2 fields are supported. Consider:

  1. Adding a table of supported/unsupported fields
  2. Linking to specific CWL v1.2 specification sections
  3. Documenting any limitations or planned features
🧰 Tools
🪛 GitHub Check: Typos

[warning] 7-7:
"stuctures" should be "structures".

🪛 GitHub Actions: Continuous Integration

[error] 7-7: Typo: 'stuctures' should be 'structures'


22-58: Enhance code examples with error handling

The example code uses the ? operator but doesn't show:

  1. The error types that might be returned
  2. How to handle different error cases
  3. The complete function signature with Result type

Consider adding:

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Existing example code
    Ok(())
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 259ea52 and 825503b.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (18)
  • .github/workflows/cd.yml (1 hunks)
  • Cargo.toml (1 hunks)
  • zefiro-core/Cargo.toml (0 hunks)
  • zefiro-core/zefiro-cwl-parser/Cargo.toml (1 hunks)
  • zefiro-core/zefiro-cwl-parser/README.md (1 hunks)
  • zefiro-core/zefiro-cwl-parser/examples/data/clt-step-schema.yml (1 hunks)
  • zefiro-core/zefiro-cwl-parser/examples/data/clt-step-values.yml (1 hunks)
  • zefiro-core/zefiro-cwl-parser/examples/data/wf-step-schema.yml (1 hunks)
  • zefiro-core/zefiro-cwl-parser/src/lib.rs (1 hunks)
  • zefiro-core/zefiro-cwl-parser/src/schema/command_line_tool.rs (1 hunks)
  • zefiro-core/zefiro-cwl-parser/src/schema/document.rs (1 hunks)
  • zefiro-core/zefiro-cwl-parser/src/schema/mod.rs (1 hunks)
  • zefiro-core/zefiro-cwl-parser/src/schema/requirements.rs (1 hunks)
  • zefiro-core/zefiro-cwl-parser/src/schema/types.rs (1 hunks)
  • zefiro-core/zefiro-cwl-parser/src/schema/workflow.rs (1 hunks)
  • zefiro-core/zefiro-cwl-parser/src/values/document.rs (1 hunks)
  • zefiro-core/zefiro-cwl-parser/src/values/mod.rs (1 hunks)
  • zefiro-core/zefiro-cwl-parser/src/values/types.rs (1 hunks)
💤 Files with no reviewable changes (1)
  • zefiro-core/Cargo.toml
✅ Files skipped from review due to trivial changes (3)
  • zefiro-core/zefiro-cwl-parser/src/values/mod.rs
  • zefiro-core/zefiro-cwl-parser/src/schema/mod.rs
  • zefiro-core/zefiro-cwl-parser/Cargo.toml
🧰 Additional context used
🪛 GitHub Check: Typos
zefiro-core/zefiro-cwl-parser/README.md

[warning] 7-7:
"stuctures" should be "structures".

🪛 GitHub Actions: Continuous Integration
zefiro-core/zefiro-cwl-parser/README.md

[error] 7-7: Typo: 'stuctures' should be 'structures'

zefiro-core/zefiro-cwl-parser/src/schema/command_line_tool.rs

[warning] 1-1: Formatting issue: Import order needs to be reorganized

zefiro-core/zefiro-cwl-parser/src/lib.rs

[warning] 3-3: Formatting issue: Extra newline needed after imports

Cargo.toml

[error] Package believes it's in a workspace when it's not. Add 'zefiro-core' to the workspace.members array in /home/runner/work/zefiro/zefiro/Cargo.toml or add an empty [workspace] table to the package's manifest.

zefiro-core/zefiro-cwl-parser/src/schema/document.rs

[warning] 3-7: Formatting issue: Import statement needs to be split into multiple lines

zefiro-core/zefiro-cwl-parser/src/values/document.rs

[warning] 10-10: Formatting issue: Missing comma in struct field

zefiro-core/zefiro-cwl-parser/src/schema/workflow.rs

[warning] 43-43: Formatting issue: Missing comma at the end of struct field

🪛 yamllint (1.35.1)
.github/workflows/cd.yml

[error] 46-46: no new line character at the end of file

(new-line-at-end-of-file)

🪛 GitHub Check: codecov/patch
zefiro-core/zefiro-cwl-parser/src/schema/document.rs

[warning] 102-103: zefiro-core/zefiro-cwl-parser/src/schema/document.rs#L102-L103
Added lines #L102 - L103 were not covered by tests

zefiro-core/zefiro-cwl-parser/src/values/document.rs

[warning] 72-73: zefiro-core/zefiro-cwl-parser/src/values/document.rs#L72-L73
Added lines #L72 - L73 were not covered by tests

🔇 Additional comments (19)
zefiro-core/zefiro-cwl-parser/src/schema/command_line_tool.rs (5)

11-30: All good: CommandLineTool struct.

The schema definition looks correct and comprehensive, with optional fields appropriately wrapped in Option. Serialization attributes are used correctly. This ensures robust and flexible serialization/deserialization.


32-47: CommandInputParameter struct looks good.

Fields and Serde attributes appear to be correctly defined, and optional fields are handled properly.


48-61: CommandOutputParameter struct looks good.

No issues found. It closely parallels the input parameter approach and maintains consistency.


62-77: CommandLineBinding struct is well-defined.

Optional parameters are correctly handled. The positioning and prefix approach is consistent with CWL specs.


78-88: OutputBinding struct is well-structured.

Serialization attributes, including skip_serializing_if, help keep the output concise. The fields align well with CWL’s command line binding pattern.

zefiro-core/zefiro-cwl-parser/src/schema/workflow.rs (1)

1-91: Workflow schema logic is sound.

All structs adhere to CWL v1.2 naming and layout, with Serde usage consistent across optional fields.

🧰 Tools
🪛 GitHub Actions: Continuous Integration

[warning] 43-43: Formatting issue: Missing comma at the end of struct field

zefiro-core/zefiro-cwl-parser/src/schema/requirements.rs (1)

1-94: ResourceRequirement defaults and additional structures look good.

The default values and usage of skip_serializing_none align well with CWL conventions, ensuring a robust specification of requirements.

zefiro-core/zefiro-cwl-parser/src/schema/document.rs (1)

1-126: Overall logic is well-structured.

The CwlSchema enum and associated methods effectively handle reading and writing CWL objects from/to YAML. The version and class checks are consistent with the CWL v1.2 spec.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 102-103: zefiro-core/zefiro-cwl-parser/src/schema/document.rs#L102-L103
Added lines #L102 - L103 were not covered by tests

🪛 GitHub Actions: Continuous Integration

[warning] 3-7: Formatting issue: Import statement needs to be split into multiple lines

zefiro-core/zefiro-cwl-parser/src/lib.rs (1)

1-5: Export statements look appropriate.

Exposing CwlSchema and CwlValues publicly is clear and aligns well with your crate’s structure.

🧰 Tools
🪛 GitHub Actions: Continuous Integration

[warning] 3-3: Formatting issue: Extra newline needed after imports

zefiro-core/zefiro-cwl-parser/src/values/types.rs (1)

32-44: Enum CwlValueType is well-structured.

The usage of #[serde(untagged)] for flexible deserialization is clear and consistent. Good job!

zefiro-core/zefiro-cwl-parser/src/schema/types.rs (5)

6-10: “Any” enum usage is acceptable, but be cautious with large YAML values.

Relying on Any(YValue) is useful for flexible parsing, but ensure you handle memory usage if large YAML documents are parsed.


42-47: Documentation enum approach is good.

Structuring single vs. multi-line documentation with separate variants is a neat approach.


49-54: Multiple format support is well-implemented.

Splitting single vs. multiple formats is helpful for validation and clarity in serialization.


56-61: Scatter enum usage is clear.

Good use of untagged variants for scattering single or multiple parameters.


63-68: Source enum is consistent with the existing pattern.

Keeping single vs. multiple source variants consistent with the rest of the schema design is logical and straightforward.

zefiro-core/zefiro-cwl-parser/examples/data/clt-step-values.yml (1)

1-4: File references look correct.

This YAML snippet aligns with the introduced File struct usage. Ensure the placeholders inputLocation and readgroup are resolved at runtime or replaced with valid values.

.github/workflows/cd.yml (1)

43-46: Review commented publish step

The commented section for publishing zefiro-core needs attention:

  1. Is this intentionally commented out?
  2. Should we add a similar step for publishing zefiro-cwl-parser?

Please clarify if we should:

  1. Remove this commented section if it's no longer needed
  2. Add a step for publishing zefiro-cwl-parser
  3. Keep both steps if both crates need publishing
🧰 Tools
🪛 yamllint (1.35.1)

[error] 46-46: no new line character at the end of file

(new-line-at-end-of-file)

zefiro-core/zefiro-cwl-parser/examples/data/wf-step-schema.yml (2)

53-56: Verify resource requirements alignment with execution environment

The resource requirements specify:

  • 2 cores minimum
  • 1000 (units?) minimum outdir size
  • 1024 (MB?) minimum RAM

Please ensure:

  1. The units for outdirMin and ramMin are documented
  2. These requirements align with the execution environment capabilities

62-64: Validate scatter configuration for parallel execution

The workflow uses dotproduct scatter method on in_file. This means:

  1. Each file in the input array will be processed independently
  2. The execution order is not guaranteed
  3. All executions must succeed for the workflow to succeed

Please ensure error handling is in place for partial failures:

zefiro-core/zefiro-cwl-parser/src/schema/workflow.rs Outdated Show resolved Hide resolved
zefiro-core/zefiro-cwl-parser/src/schema/document.rs Outdated Show resolved Hide resolved
.github/workflows/cd.yml Outdated Show resolved Hide resolved
.github/workflows/cd.yml Outdated Show resolved Hide resolved
zefiro-core/zefiro-cwl-parser/README.md Outdated Show resolved Hide resolved
zefiro-core/zefiro-cwl-parser/README.md Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
zefiro-core/zefiro-cwl-parser/README.md (2)

22-58: Consider adding error handling examples

While the example is comprehensive, it would be beneficial to demonstrate proper error handling. Consider:

  1. Showing how to handle specific error types
  2. Adding comments about what errors might occur
  3. Providing an example with explicit error handling instead of using the ? operator

Example enhancement:

// Example with explicit error handling
let schema = match CwlSchema::from_path("workflow.yml") {
    Ok(schema) => schema,
    Err(e) => {
        eprintln!("Failed to parse CWL schema: {}", e);
        // Handle specific error types
        return Err(e);
    }
};

1-2: Consider adding a table of contents

For better navigation, consider adding a table of contents at the beginning of the README. This would be especially helpful as the documentation grows with more examples and features.

Example:

## Table of Contents
- [Overview](#overview)
- [Usage](#usage)
  - [Parsing CWL Schema Documents](#parsing-cwl-schema-documents)
  - [Parsing CWL Values Documents](#parsing-cwl-values-documents)
zefiro-core/zefiro-cwl-parser/src/main.rs (2)

6-6: Consider more user-friendly error handling.
Using .expect("Failed to deserialize CWL values document") can abruptly terminate the program. Depending on use cases, a more graceful error handling approach (e.g., returning Result or printing a user-facing message) may be preferable.


17-19: Evaluate reactivating or removing the commented-out logic.
If you plan to iterate over schema values for debugging or validation, consider reintroducing the code in lines 17–19 or removing it if it's no longer needed. Dead code can become confusing over time.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 825503b and 8810bf5.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • .github/workflows/ci.yml (1 hunks)
  • zefiro-core/Cargo.toml (1 hunks)
  • zefiro-core/zefiro-cwl-parser/Cargo.toml (1 hunks)
  • zefiro-core/zefiro-cwl-parser/README.md (1 hunks)
  • zefiro-core/zefiro-cwl-parser/src/main.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • zefiro-core/zefiro-cwl-parser/Cargo.toml
🧰 Additional context used
🪛 GitHub Actions: Continuous Integration
zefiro-core/Cargo.toml

[error] Multiple workspace roots found in the same workspace: '/home/runner/work/zefiro/zefiro/zefiro-core' and '/home/runner/work/zefiro/zefiro'

🔇 Additional comments (5)
zefiro-core/zefiro-cwl-parser/README.md (3)

1-8: LGTM! Clear and informative overview.

The overview section effectively communicates the library's purpose and limitations, with helpful links to the CWL specification.


16-16: Update dependency version placeholder

The dependency line still contains a placeholder that needs to be replaced with the actual version.


70-75: Clarify supported storage backends

The example shows an S3 URI (s3://bucket/input.txt), but there's no mention of supported storage backends or URI schemes. Please clarify:

  1. What storage backends are supported?
  2. What URI schemes are valid?
  3. Are there any prerequisites for using S3 or other remote storage?

Let's check if there's any documentation about storage backends:

zefiro-core/zefiro-cwl-parser/src/main.rs (1)

7-12: Validate the content of iterated values.
Before matching on the types (e.g., CwlValueType::File), consider adding checks to ensure that the values are valid and consistent. If the file path or associated data is invalid, you might risk panics or silent errors down the line.

.github/workflows/ci.yml (1)

24-26: Good practice for deterministic builds.
Switching from --offline to --locked ensures your build uses Cargo.lock dependencies exactly, improving reproducibility and consistency. Watch out for environment constraints (e.g., ephemeral container runs) so that locked builds can still find all needed crates.

zefiro-core/Cargo.toml Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (1)
zefiro-core/zefiro-cwl-parser/src/values/types.rs (1)

12-15: 🛠️ Refactor suggestion

Return references instead of cloning strings.

As noted in the previous review, to avoid unnecessary allocations, consider returning &str instead of String in both File and Directory implementations.

-    pub fn get_location(&self) -> String {
-        self.location.clone()
+    pub fn get_location(&self) -> &str {
+        &self.location
     }

Also applies to: 26-29

🧹 Nitpick comments (11)
zefiro-core/zefiro-cwl-parser/src/values/types.rs (2)

6-9: Consider extracting common fields into a trait or base struct.

The File and Directory structs share identical fields. Consider reducing duplication by extracting common fields into a trait or base struct.

Example approach using a trait:

pub trait CwlLocation {
    fn get_location(&self) -> &str;
}

#[derive(Clone, Debug, Deserialize, Serialize)]
pub struct File {
    pub class: String,
    pub location: String,
}

impl CwlLocation for File {
    fn get_location(&self) -> &str {
        &self.location
    }
}

// Similar implementation for Directory

Also applies to: 20-23


31-43: Add documentation for CwlValueType enum.

The enum lacks documentation explaining its purpose and the meaning of each variant. Consider adding documentation similar to the File and Directory structs.

Example:

/// Represents the various value types supported in CWL
/// See: https://www.commonwl.org/v1.2/CommandLineTool.html#CWLType
#[derive(Clone, Debug, Deserialize, Serialize)]
#[serde(untagged)]
pub enum CwlValueType {
    /// Boolean values (true/false)
    Boolean(bool),
    // ... document other variants
}
zefiro-core/zefiro-cwl-parser/src/values/document.rs (1)

81-85: Enhance test coverage with error cases.

The current test only verifies successful parsing. Consider adding tests for:

  • Invalid YAML syntax
  • Missing required fields
  • Invalid file paths

Example:

#[test]
fn test_parse_invalid_file_path() {
    let result = CwlValues::from_path("nonexistent.yml");
    assert!(result.is_err());
    assert!(result.unwrap_err().to_string().contains("Failed to open file"));
}
zefiro-core/zefiro-cwl-parser/src/schema/command_line_tool.rs (2)

37-46: Add missing optional fields for CommandInputParameter.

Consider adding these optional fields from the CWL spec:

  • label: For human-readable display
  • secondaryFiles: For associated files
  • streamable: For streaming optimization
  • format: For data format identification
  • loadContents: For small file contents

53-60: Add missing optional fields for CommandOutputParameter.

Consider adding these optional fields from the CWL spec:

  • label: For human-readable display
  • secondaryFiles: For associated files
  • streamable: For streaming optimization
  • format: For data format identification
zefiro-core/zefiro-cwl-parser/src/schema/document.rs (6)

11-11: Consider future CWL version compatibility.

The SUPPORTED_VERSIONS constant currently only supports CWL v1.2. Consider:

  1. Adding support for older versions (v1.0, v1.1) if backward compatibility is desired
  2. Documenting the version support policy
  3. Making the version check configurable for future updates

13-19: Enhance enum documentation with more details.

While the documentation is good, consider adding:

  1. Examples of when to use each variant
  2. Links to CWL specification for CommandLineTool and Workflow
  3. Notes about the untagged serialization behavior

31-34: Consider using AsRef for more flexible path handling.

The from_path method could be more flexible by accepting various path types.

-    pub fn from_path(path: &str) -> Result<Self> {
-        let reader = BufReader::new(File::open(path)?);
+    pub fn from_path<P: AsRef<std::path::Path>>(path: P) -> Result<Self> {
+        let reader = BufReader::new(File::open(path.as_ref())?);

37-53: Enhance version validation and error handling.

The version validation could be improved:

  1. Consider using a proper version comparison instead of string matching
  2. Add more context to error messages
  3. Consider logging the actual version when an unsupported version is encountered
     pub fn from_yaml(value: Value) -> Result<Self> {
         let version = value
             .get("cwlVersion")
             .and_then(Value::as_str)
-            .ok_or_else(|| anyhow::anyhow!("Failed to determine CWL specification version."))?;
+            .ok_or_else(|| anyhow::anyhow!("Missing required field 'cwlVersion' in CWL document"))?;
         ensure!(
             SUPPORTED_VERSIONS.contains(&version),
-            "Unsupported CWL version: {version}"
+            "Unsupported CWL version: {}. Supported versions are: {:?}", 
+            version, SUPPORTED_VERSIONS
         );

94-97: Improve error handling specificity in from_string.

The error handling could be more specific about what went wrong during parsing.

     pub fn from_string(yaml_input: &str) -> Result<Self, Error> {
-        serde_yaml::from_str(yaml_input)
-            .map_err(|e| Error::msg(format!("Failed to parse CWL schema from string: {}", e)))
+        let value: Value = serde_yaml::from_str(yaml_input)
+            .map_err(|e| Error::msg(format!("Invalid YAML syntax: {}", e)))?;
+        Self::from_yaml(value)

110-116: Add documentation for FromStr implementation.

Consider adding documentation explaining:

  1. The purpose of implementing FromStr
  2. Example usage
  3. Error conditions
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8810bf5 and a4d44b0.

📒 Files selected for processing (12)
  • Cargo.toml (1 hunks)
  • zefiro-core/Cargo.toml (1 hunks)
  • zefiro-core/zefiro-cwl-parser/src/lib.rs (1 hunks)
  • zefiro-core/zefiro-cwl-parser/src/schema/command_line_tool.rs (1 hunks)
  • zefiro-core/zefiro-cwl-parser/src/schema/document.rs (1 hunks)
  • zefiro-core/zefiro-cwl-parser/src/schema/mod.rs (1 hunks)
  • zefiro-core/zefiro-cwl-parser/src/schema/requirements.rs (1 hunks)
  • zefiro-core/zefiro-cwl-parser/src/schema/types.rs (1 hunks)
  • zefiro-core/zefiro-cwl-parser/src/schema/workflow.rs (1 hunks)
  • zefiro-core/zefiro-cwl-parser/src/values/document.rs (1 hunks)
  • zefiro-core/zefiro-cwl-parser/src/values/mod.rs (1 hunks)
  • zefiro-core/zefiro-cwl-parser/src/values/types.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • zefiro-core/zefiro-cwl-parser/src/values/mod.rs
  • zefiro-core/zefiro-cwl-parser/src/lib.rs
  • zefiro-core/zefiro-cwl-parser/src/schema/mod.rs
  • zefiro-core/Cargo.toml
  • zefiro-core/zefiro-cwl-parser/src/schema/types.rs
🔇 Additional comments (7)
Cargo.toml (1)

7-11: LGTM! Workspace configuration looks correct.

The workspace members are properly configured, including both the new zefiro-cwl-parser crate and its parent zefiro-core.

zefiro-core/zefiro-cwl-parser/src/values/document.rs (2)

14-17: LGTM! Good use of serde flatten and HashMap.

The structure is well-designed with appropriate use of serde attributes and HashMap for flexible key-value storage.


71-73: Add test coverage for serialization methods.

The to_yaml method lacks test coverage. Additionally, consider testing:

  • Error cases for from_path and from_string
  • The to_string method

Would you like help drafting comprehensive tests for these methods?

zefiro-core/zefiro-cwl-parser/src/schema/command_line_tool.rs (1)

11-30: ⚠️ Potential issue

Add missing required and optional fields per CWL spec.

The CommandLineTool struct is missing several fields defined in the CWL spec:

  • Required: baseCommand (string or array of strings)
  • Optional but important: arguments, stdin, stdout, stderr

Additionally, consider adding validation for the class field to ensure it's "CommandLineTool".

zefiro-core/zefiro-cwl-parser/src/schema/requirements.rs (1)

35-37: Add missing optional Docker requirement fields.

Add these optional fields from the CWL spec:

  • dockerLoad: Load from a file
  • dockerFile: Supply inline Dockerfile
  • dockerImport: Import from a URL
  • dockerImageId: Use specific image ID
  • dockerOutputDirectory: Set working directory

The empty docker_pull validation issue was previously reported.

zefiro-core/zefiro-cwl-parser/src/schema/document.rs (2)

1-9: Split the import statements into multiple lines for better readability.

The import statements should be split for better readability and to comply with formatting guidelines.


118-129: Enhance test coverage.

The current test suite only covers successful parsing. Consider adding tests for:

  1. Error cases (invalid version, invalid class, malformed YAML)
  2. Serialization methods (to_string, to_yaml)
  3. FromStr implementation
  4. Edge cases (empty document, minimal valid documents)

Would you like me to generate additional test cases to improve coverage?

@zefiroproj zefiroproj deleted a comment from coderabbitai bot Jan 4, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🧹 Nitpick comments (9)
zefiro-core/zefiro-cwl-parser/src/schema/requirements.rs (3)

4-7: Add documentation for resource requirement constants.

Consider adding documentation comments for these constants to explain their purpose and units (MB).

+/// Default number of CPU cores
 const CPU_NUM_DEFAULT: u32 = 1;
+/// Default RAM size in megabytes
 const RAM_SIZE_IN_MB_DEFAULT: u32 = 1024;
+/// Default minimum temporary directory size in megabytes
 const TMPDIR_MIN_IN_MB_DEFAULT: u32 = 1024;
+/// Default minimum output directory size in megabytes
 const OUTDIR_MIN_IN_MB_DEFAULT: u32 = 1024;

18-27: Add documentation link and skip_serializing_none attribute.

Consider these enhancements:

  1. Add a link to the CWL spec for CommandLineToolRequirement
  2. Add the skip_serializing_none attribute which might be needed for optional fields in the variants
+/// See: https://www.commonwl.org/v1.2/CommandLineTool.html#Requirements
 /// Describes requirements for `CommandLineTool`.
+#[skip_serializing_none]
 #[derive(Clone, Debug, Deserialize, Serialize)]
 #[serde(tag = "class")]
 pub enum CommandLineToolRequirement {

99-106: Consider adding a default value for enable_reuse.

It might be useful to provide a default value for enable_reuse to make the struct easier to use.

 pub struct WorkReuse {
+    #[serde(default)]
     pub enable_reuse: bool,
 }

+impl Default for WorkReuse {
+    fn default() -> Self {
+        Self {
+            enable_reuse: true,  // or false, depending on your use case
+        }
+    }
+}
.github/workflows/cd.yml (2)

28-40: Enhance CI job reliability and performance

While the basic setup is good, consider these improvements:

  1. Add a timeout to prevent hung jobs
  2. Specify explicit Rust version
  3. Add cargo dependency caching
 publish-crates-io:
   name: Publish on crates.io
   runs-on: ubuntu-22.04
+  timeout-minutes: 10
   needs: generate-changelog
   steps:
     - name: Checkout
       uses: actions/checkout@v4
+    - name: Cache dependencies
+      uses: actions/cache@v3
+      with:
+        path: |
+          ~/.cargo/registry
+          ~/.cargo/git
+          target
+        key: ${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }}
     - name: Verify package
       run: cargo verify-project --manifest-path zefiro-core/zefiro-cwl-parser/Cargo.toml
     - name: Set the release version
       run: echo "RELEASE_VERSION=${GITHUB_REF:11}" >> "$GITHUB_ENV"
     - name: Install Rust toolchain
       uses: dtolnay/rust-toolchain@stable
+      with:
+        toolchain: 1.75.0  # or your minimum supported version

44-44: Add newline at end of file

The file is missing a newline character at the end, which is a POSIX requirement.

       #   cargo publish --manifest-path zefiro-core/Cargo.toml \
-      #       --locked --token ${{ secrets.CARGO_REGISTRY_TOKEN }}
+      #       --locked --token ${{ secrets.CARGO_REGISTRY_TOKEN }}
+
🧰 Tools
🪛 yamllint (1.35.1)

[error] 44-44: no new line character at the end of file

(new-line-at-end-of-file)

zefiro-core/zefiro-cwl-parser/src/schema/workflow.rs (3)

21-21: Add documentation for the requirements field.

Consider adding a doc comment explaining:

  1. The purpose of workflow requirements
  2. Common requirement types
  3. Impact on workflow execution

29-34: Consider making id field required for WorkflowInputParameter.

The id field is typically required in CWL to uniquely identify input parameters. Consider:

  1. Making id non-optional
  2. Adding validation for unique IDs within a workflow

69-69: Enhance type safety for scatter_method.

Consider using an enum for scatter_method instead of String:

#[derive(Clone, Debug, Deserialize, Serialize)]
#[serde(rename_all = "snake_case")]
pub enum ScatterMethod {
    DotProduct,
    NestedCrossProduct,
    FlatCrossProduct,
}
zefiro-core/zefiro-cwl-parser/src/values/types.rs (1)

1-43: Consider adding custom error types for better error handling.

The current implementation could benefit from a dedicated error type to handle various validation errors that might occur during struct creation or value conversion.

Consider adding:

#[derive(Debug, thiserror::Error)]
pub enum CwlValueError {
    #[error("Invalid class: expected {expected}, got {actual}")]
    InvalidClass {
        expected: &'static str,
        actual: String,
    },
    #[error("Invalid location: {0}")]
    InvalidLocation(String),
    #[error("Type conversion error: {0}")]
    TypeConversion(String),
}

// Then update the constructors to use this error type
impl File {
    pub fn new(location: String) -> Result<Self, CwlValueError> {
        if location.is_empty() {
            return Err(CwlValueError::InvalidLocation(
                "location cannot be empty".to_string()
            ));
        }
        // ... rest of the implementation
    }
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7c44112 and ee3fc03.

📒 Files selected for processing (6)
  • .github/workflows/cd.yml (1 hunks)
  • zefiro-core/zefiro-cwl-parser/README.md (1 hunks)
  • zefiro-core/zefiro-cwl-parser/src/schema/command_line_tool.rs (1 hunks)
  • zefiro-core/zefiro-cwl-parser/src/schema/requirements.rs (1 hunks)
  • zefiro-core/zefiro-cwl-parser/src/schema/workflow.rs (1 hunks)
  • zefiro-core/zefiro-cwl-parser/src/values/types.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • zefiro-core/zefiro-cwl-parser/README.md
🧰 Additional context used
📓 Learnings (2)
zefiro-core/zefiro-cwl-parser/src/schema/workflow.rs (1)
Learnt from: nsyzrantsev
PR: zefiroproj/zefiro#4
File: zefiro-core/zefiro-cwl-parser/src/schema/workflow.rs:61-70
Timestamp: 2025-01-04T16:22:22.394Z
Learning: In `zefiro-core/zefiro-cwl-parser/src/schema/workflow.rs`, the user explicitly wants the `WorkflowStep::run` field to support only `CommandLineTool`.
zefiro-core/zefiro-cwl-parser/src/schema/requirements.rs (1)
Learnt from: nsyzrantsev
PR: zefiroproj/zefiro#4
File: zefiro-core/zefiro-cwl-parser/src/schema/requirements.rs:83-85
Timestamp: 2025-01-04T16:02:13.319Z
Learning: For the `ToolTimeLimit.timelimit` field in `zefiro-core/zefiro-cwl-parser/src/schema/requirements.rs`, the user indicated that the type can be either a string or a u32.
🪛 yamllint (1.35.1)
.github/workflows/cd.yml

[error] 44-44: no new line character at the end of file

(new-line-at-end-of-file)

🔇 Additional comments (14)
zefiro-core/zefiro-cwl-parser/src/schema/requirements.rs (6)

9-16: LGTM! Well-structured workflow requirements enum.

The enum is well-documented and correctly implements serialization with appropriate tagging.


34-36: Empty docker_pull validation is needed

Based on the code inspection, there's currently no validation for empty docker_pull strings in the DockerRequirement struct.

Consider implementing validation using a newtype pattern:

#[derive(Clone, Debug, Deserialize, Serialize)]
pub struct DockerImage(String);

impl DockerImage {
    pub fn new(image: String) -> Result<Self, &'static str> {
        if image.trim().is_empty() {
            Err("docker_pull cannot be empty")
        } else {
            Ok(Self(image))
        }
    }
}

pub struct DockerRequirement {
    pub docker_pull: DockerImage,
}

43-55: Enhance ResourceRequirement with maximum limits.

Consider adding maximum resource fields from the CWL spec:

  • coresMax
  • ramMax
  • tmpdirMax
  • outdirMax

72-76: LGTM! Clear and concise JavaScript requirement implementation.

The unit-like struct is well-documented and appropriately implements the required traits.


82-91: LGTM! Correct implementation of timelimit as discussed.

The implementation correctly handles both numeric seconds and string expressions using an untagged enum, as previously agreed upon.


93-97: LGTM! Well-documented scatter feature requirement.

The implementation is clear and follows the same pattern as other unit-like requirement structs.

.github/workflows/cd.yml (1)

26-26: LGTM!

The git-cliff command formatting change maintains the same functionality.

zefiro-core/zefiro-cwl-parser/src/schema/workflow.rs (3)

1-6: Well-organized imports with appropriate dependencies!

The imports are properly organized, separating internal module imports from external crate dependencies. The use of serde and serde_with is appropriate for CWL document parsing and serialization.


36-54: Well-designed output parameter structure!

Good implementation of:

  1. Flexible output source representation using untagged enum
  2. Clear documentation with spec reference
  3. Appropriate optional fields

84-90: Clean and minimal implementation of workflow step output!

The implementation is appropriately simple with just the required id field and clear documentation.

zefiro-core/zefiro-cwl-parser/src/values/types.rs (2)

12-14: Return a reference instead of cloning the location.

To avoid unnecessary allocations, consider returning &str instead of String, as the caller can clone if needed.

-    pub fn get_location(&self) -> String {
-        self.location.clone()
+    pub fn get_location(&self) -> &str {
+        &self.location
     }

31-43: Enhance CwlValueType enum with documentation and helper methods.

The enum could benefit from:

  1. Documentation for each variant explaining their CWL type mapping
  2. Helper methods for type checking and common conversions
  3. Consider potential serialization ambiguity with untagged enum (e.g., i32 vs i64)

Consider adding:

/// Represents all possible CWL value types
#[derive(Clone, Debug, Deserialize, Serialize)]
#[serde(untagged)]
pub enum CwlValueType {
    /// Represents a boolean value
    Boolean(bool),
    /// Represents a 32-bit integer
    Int(i32),
    /// Represents a 64-bit integer
    Long(i64),
    // ... (document other variants)
}

impl CwlValueType {
    pub fn is_numeric(&self) -> bool {
        matches!(self, 
            Self::Int(_) | 
            Self::Long(_) | 
            Self::Float(_) | 
            Self::Double(_)
        )
    }

    pub fn as_i64(&self) -> Option<i64> {
        match self {
            Self::Int(n) => Some(*n as i64),
            Self::Long(n) => Some(*n),
            _ => None,
        }
    }
    
    // Add more helper methods as needed
}

Let's verify the potential serialization ambiguity:

zefiro-core/zefiro-cwl-parser/src/schema/command_line_tool.rs (2)

1-5: Well-structured imports and serialization setup!

Good use of serde with appropriate attributes for JSON serialization/deserialization. The imports are clean and focused on the required functionality.


1-87: Add unit tests for serialization/deserialization

Consider adding unit tests to verify:

  1. Serialization/deserialization of all structs
  2. Default values and optional fields behavior
  3. Edge cases with different field combinations

Would you like me to generate example unit tests for these structs?

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (3)
zefiro-core/zefiro-cwl-parser/src/values/types.rs (3)

3-7: 🛠️ Refactor suggestion

Add validation for File struct fields.

The struct should validate that the location is not empty and represents a valid path.

impl File {
    pub fn new(location: String) -> Result<Self, &'static str> {
        if location.is_empty() {
            return Err("location cannot be empty");
        }
        Ok(Self { location })
    }
}

9-13: 🛠️ Refactor suggestion

Return a reference instead of cloning the location.

To avoid unnecessary allocations, consider returning &str instead of String.

-    pub fn get_location(&self) -> String {
-        self.location.clone()
+    pub fn get_location(&self) -> &str {
+        &self.location
     }

15-19: 🛠️ Refactor suggestion

Add validation for Directory struct fields.

The struct should validate that the location is not empty and represents a valid path.

impl Directory {
    pub fn new(location: String) -> Result<Self, &'static str> {
        if location.is_empty() {
            return Err("location cannot be empty");
        }
        Ok(Self { location })
    }
}
🧹 Nitpick comments (4)
zefiro-core/zefiro-cwl-parser/Cargo.toml (1)

19-23: Consider using caret requirements for dependencies.

For better dependency management, consider using caret requirements (^) instead of exact versions. This allows for compatible updates while maintaining SemVer guarantees.

[dependencies]
-anyhow = "1.0.95"
-serde = { version = "1.0.216", features = ["derive"] }
-serde_with = "3.12.0"
-serde_yaml = "0.9.34"
+anyhow = "^1.0.95"
+serde = { version = "^1.0.216", features = ["derive"] }
+serde_with = "^3.12.0"
+serde_yaml = "^0.9.34"
zefiro-core/zefiro-cwl-parser/src/values/document.rs (2)

72-87: Improve doctest for to_yaml.

The current doctest doesn't verify the output. Consider adding assertions to validate the written content.

 /// ```
 /// use zefiro_cwl_parser::values::document::CwlValues;
-/// use std::io::BufWriter;
+/// use std::io::{BufWriter, Read};
 ///
 /// let yaml_input = r#"
 /// in_file:
 ///     class: File
 ///     location: 's3://bucket/path/to/input.txt'
 /// out_file: 'output.txt'
 /// "#;
 ///
 /// let values = CwlValues::from_string(yaml_input).expect("Failed to deserialize CWL values document");
 /// let mut tmpfile = tempfile::tempfile().unwrap();
-/// let mut writer = BufWriter::new(tmpfile);
-/// values.to_yaml(writer);
+/// let writer = BufWriter::new(&tmpfile);
+/// values.to_yaml(writer).expect("Failed to write YAML");
+/// 
+/// // Verify written content
+/// let mut content = String::new();
+/// tmpfile.seek(std::io::SeekFrom::Start(0)).unwrap();
+/// tmpfile.read_to_string(&mut content).unwrap();
+/// assert!(content.contains("s3://bucket/path/to/input.txt"));
+/// assert!(content.contains("output.txt"));
 /// ```

35-43: Improve error handling in from_path.

Consider using std::path::Path for better path handling and validation.

-    pub fn from_path(path: &str) -> Result<Self, Error> {
+    pub fn from_path<P: AsRef<std::path::Path>>(path: P) -> Result<Self, Error> {
+        let path = path.as_ref();
         let reader = BufReader::new(
             File::open(path)
-                .map_err(|e| Error::msg(format!("Failed to open file '{}': {}", path, e)))?,
+                .map_err(|e| Error::msg(format!("Failed to open file '{}': {}", path.display(), e)))?,
         );
 
         serde_yaml::from_reader(reader)
-            .map_err(|e| Error::msg(format!("Failed to parse CWL values from '{}'; {}", path, e)))
+            .map_err(|e| Error::msg(format!("Failed to parse CWL values from '{}'; {}", path.display(), e)))
zefiro-core/zefiro-cwl-parser/src/schema/command_line_tool.rs (1)

11-22: Consider making id field optional per CWL spec

The id field is currently required, but according to the CWL spec, it should be optional.

-    pub id: String,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub id: Option<String>,
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ee3fc03 and a7ba5af.

📒 Files selected for processing (4)
  • zefiro-core/zefiro-cwl-parser/Cargo.toml (1 hunks)
  • zefiro-core/zefiro-cwl-parser/src/schema/command_line_tool.rs (1 hunks)
  • zefiro-core/zefiro-cwl-parser/src/values/document.rs (1 hunks)
  • zefiro-core/zefiro-cwl-parser/src/values/types.rs (1 hunks)
🧰 Additional context used
📓 Learnings (1)
zefiro-core/zefiro-cwl-parser/src/schema/command_line_tool.rs (1)
Learnt from: nsyzrantsev
PR: zefiroproj/zefiro#4
File: zefiro-core/zefiro-cwl-parser/src/schema/command_line_tool.rs:52-59
Timestamp: 2025-01-04T17:12:06.568Z
Learning: The user indicated that fields like `label`, `secondaryFiles`, `streamable`, `doc`, `format`, `loadContents`, and `loadListing` are not required for the MVP version of the CommandOutputParameter struct.
🪛 GitHub Actions: Continuous Integration
zefiro-core/zefiro-cwl-parser/src/values/types.rs

[warning] 44-44: Code formatting issue: Missing newline at the end of the file

zefiro-core/zefiro-cwl-parser/src/schema/command_line_tool.rs

[warning] 35-37: Code formatting issue: Error message string needs to be wrapped across multiple lines according to rustfmt rules

🪛 GitHub Check: codecov/patch
zefiro-core/zefiro-cwl-parser/src/schema/command_line_tool.rs

[warning] 37-38: zefiro-core/zefiro-cwl-parser/src/schema/command_line_tool.rs#L37-L38
Added lines #L37 - L38 were not covered by tests


[warning] 42-42: zefiro-core/zefiro-cwl-parser/src/schema/command_line_tool.rs#L42
Added line #L42 was not covered by tests

zefiro-core/zefiro-cwl-parser/src/values/document.rs

[warning] 88-89: zefiro-core/zefiro-cwl-parser/src/values/document.rs#L88-L89
Added lines #L88 - L89 were not covered by tests

🔇 Additional comments (3)
zefiro-core/zefiro-cwl-parser/src/values/document.rs (1)

88-90: Add test coverage for the to_yaml method.

Static analysis indicates these lines lack coverage. Consider adding a test to verify the serialization functionality.

Would you like me to help draft a comprehensive test that exercises the to_yaml method?

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 88-89: zefiro-core/zefiro-cwl-parser/src/values/document.rs#L88-L89
Added lines #L88 - L89 were not covered by tests

zefiro-core/zefiro-cwl-parser/src/schema/command_line_tool.rs (2)

46-74: Implementation looks good!

The parameter structs are well-documented and correctly implement the essential fields needed for the MVP version.


81-90: 🛠️ Refactor suggestion

Update position type to match CWL spec

The position field should be i32 instead of u32 to match the CWL specification, which allows for negative positions.

-    pub position: Option<u32>,
+    pub position: Option<i32>,

Likely invalid or redundant comment.

zefiro-core/zefiro-cwl-parser/Cargo.toml Outdated Show resolved Hide resolved
zefiro-core/zefiro-cwl-parser/src/values/types.rs Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
zefiro-core/zefiro-cwl-parser/src/schema/command_line_tool.rs (2)

16-16: Consider making id field optional

According to the CWL specification, the id field should be optional. Consider updating the field definition:

-    pub id: String,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub id: Option<String>,

38-40: Enhance error message with more context

The error message could be more descriptive to help users understand why Docker requirement is mandatory.

-            return Err(serde::ser::Error::custom(
-                "CommandLineTool must have DockerRequirement",
-            ));
+            return Err(serde::ser::Error::custom(
+                "CommandLineTool must have DockerRequirement in its requirements list. \
+                 This ensures proper container isolation and reproducibility.",
+            ));
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a7ba5af and c3b38c6.

📒 Files selected for processing (2)
  • zefiro-core/zefiro-cwl-parser/src/schema/command_line_tool.rs (1 hunks)
  • zefiro-core/zefiro-cwl-parser/src/values/types.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • zefiro-core/zefiro-cwl-parser/src/values/types.rs
🧰 Additional context used
📓 Learnings (1)
zefiro-core/zefiro-cwl-parser/src/schema/command_line_tool.rs (1)
Learnt from: nsyzrantsev
PR: zefiroproj/zefiro#4
File: zefiro-core/zefiro-cwl-parser/src/schema/command_line_tool.rs:52-59
Timestamp: 2025-01-04T17:12:06.568Z
Learning: The user indicated that fields like `label`, `secondaryFiles`, `streamable`, `doc`, `format`, `loadContents`, and `loadListing` are not required for the MVP version of the CommandOutputParameter struct.
🪛 GitHub Check: codecov/patch
zefiro-core/zefiro-cwl-parser/src/schema/command_line_tool.rs

[warning] 37-39: zefiro-core/zefiro-cwl-parser/src/schema/command_line_tool.rs#L37-L39
Added lines #L37 - L39 were not covered by tests


[warning] 44-44: zefiro-core/zefiro-cwl-parser/src/schema/command_line_tool.rs#L44
Added line #L44 was not covered by tests

🔇 Additional comments (5)
zefiro-core/zefiro-cwl-parser/src/schema/command_line_tool.rs (5)

53-62: LGTM! Well-structured MVP implementation

The CommandInputParameter struct includes essential fields while maintaining simplicity for the MVP version.


69-76: LGTM! Focused MVP implementation

The CommandOutputParameter struct maintains a clean, minimal interface with essential fields, aligning with the MVP goals.


83-92: LGTM! Clean binding implementation

The CommandLineBinding struct provides essential command-line binding functionality needed for the MVP.


99-105: LGTM! Concise output binding implementation

The OutputBinding struct effectively captures the essential output binding rules while maintaining simplicity for the MVP.


32-46: Add test coverage for CommandLineTool serialization

The serialization implementation lacks test coverage, particularly for:

  • Successful serialization with valid Docker requirement
  • Failed serialization when Docker requirement is missing
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 37-39: zefiro-core/zefiro-cwl-parser/src/schema/command_line_tool.rs#L37-L39
Added lines #L37 - L39 were not covered by tests


[warning] 44-44: zefiro-core/zefiro-cwl-parser/src/schema/command_line_tool.rs#L44
Added line #L44 was not covered by tests

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
.github/workflows/cd.yml (2)

28-44: Secure configuration approved with suggestions for robustness.

The publishing job is well-configured with secure practices:

  • Uses explicit manifest path
  • Employs --locked flag to prevent dependency tampering
  • Properly handles secrets

Consider these improvements for better robustness:

 - name: Publish the zefiro-cwl-parser library
   run: |
+    # Retry logic for network issues
+    for i in 1 2 3; do
       cargo publish --manifest-path zefiro-core/zefiro-cwl-parser/Cargo.toml \
-        --locked --token ${{ secrets.CARGO_REGISTRY_TOKEN }}
+        --locked --token ${{ secrets.CARGO_REGISTRY_TOKEN }} && break || \
+        if [ $i -lt 3 ]; then sleep 10; else exit 1; fi
+    done
+    
+    # Verify successful publish
+    cargo search zefiro-cwl-parser --limit 1 | grep -q "zefiro-cwl-parser = \"${RELEASE_VERSION#v}\""

Also add a timeout to the job:

 publish-crates-io:
   name: Publish on crates.io
   runs-on: ubuntu-22.04
+  timeout-minutes: 10
   needs: generate-changelog
🧰 Tools
🪛 yamllint (1.35.1)

[error] 44-44: no new line character at the end of file

(new-line-at-end-of-file)


44-44: Add newline at end of file.

The YAML file is missing a newline at the end, which is a POSIX requirement.

Add a newline at the end of the file.

🧰 Tools
🪛 yamllint (1.35.1)

[error] 44-44: no new line character at the end of file

(new-line-at-end-of-file)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c3b38c6 and f4735c7.

📒 Files selected for processing (2)
  • .github/workflows/cd.yml (1 hunks)
  • zefiro-core/zefiro-cwl-parser/Cargo.toml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • zefiro-core/zefiro-cwl-parser/Cargo.toml
🧰 Additional context used
🪛 yamllint (1.35.1)
.github/workflows/cd.yml

[error] 44-44: no new line character at the end of file

(new-line-at-end-of-file)

🔇 Additional comments (1)
.github/workflows/cd.yml (1)

26-26: LGTM! Command formatting improvement.

The git-cliff command formatting change improves readability while maintaining the same functionality.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
zefiro-core/zefiro-cwl-parser/src/values/document.rs (1)

74-76: Fix typo in error message

There's a typo in the error message: "dserialize" should be "serialize".

-            .map_err(|e| Error::msg(format!("Failed to dserialize CWL values to string: {}", e)))
+            .map_err(|e| Error::msg(format!("Failed to serialize CWL values to string: {}", e)))
zefiro-core/zefiro-cwl-parser/src/schema/document.rs (2)

11-11: Consider making supported versions configurable

The SUPPORTED_VERSIONS constant could be made configurable to support future CWL versions without code changes.

Consider using a configuration file or environment variable to specify supported versions.


36-53: Enhance error messages with more context

The error messages could be more descriptive by including the actual values encountered.

-            .ok_or_else(|| anyhow::anyhow!("Failed to determine CWL specification version."))?;
+            .ok_or_else(|| anyhow::anyhow!("Failed to determine CWL specification version. Document must include 'cwlVersion' field."))?;

-            Some(class) => bail!("Unsupported CWL document class: {class}"),
-            None => bail!("Failed to determine CWL document class."),
+            Some(class) => bail!("Unsupported CWL document class: '{}'. Supported classes are 'CommandLineTool' and 'Workflow'.", class),
+            None => bail!("Failed to determine CWL document class. Document must include 'class' field."),
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f4735c7 and 43cd16d.

📒 Files selected for processing (3)
  • zefiro-core/zefiro-cwl-parser/src/schema/command_line_tool.rs (1 hunks)
  • zefiro-core/zefiro-cwl-parser/src/schema/document.rs (1 hunks)
  • zefiro-core/zefiro-cwl-parser/src/values/document.rs (1 hunks)
🧰 Additional context used
📓 Learnings (1)
zefiro-core/zefiro-cwl-parser/src/schema/command_line_tool.rs (1)
Learnt from: nsyzrantsev
PR: zefiroproj/zefiro#4
File: zefiro-core/zefiro-cwl-parser/src/schema/command_line_tool.rs:52-59
Timestamp: 2025-01-04T17:12:06.568Z
Learning: The user indicated that fields like `label`, `secondaryFiles`, `streamable`, `doc`, `format`, `loadContents`, and `loadListing` are not required for the MVP version of the CommandOutputParameter struct.
🪛 GitHub Check: codecov/patch
zefiro-core/zefiro-cwl-parser/src/values/document.rs

[warning] 96-97: zefiro-core/zefiro-cwl-parser/src/values/document.rs#L96-L97
Added lines #L96 - L97 were not covered by tests

zefiro-core/zefiro-cwl-parser/src/schema/document.rs

[warning] 116-117: zefiro-core/zefiro-cwl-parser/src/schema/document.rs#L116-L117
Added lines #L116 - L117 were not covered by tests

🔇 Additional comments (9)
zefiro-core/zefiro-cwl-parser/src/values/document.rs (1)

96-98: Add test coverage for to_yaml method

The to_yaml method lacks test coverage. Consider adding tests to verify successful serialization and error handling.

Would you like me to help draft test cases for the to_yaml method?

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 96-97: zefiro-core/zefiro-cwl-parser/src/values/document.rs#L96-L97
Added lines #L96 - L97 were not covered by tests

zefiro-core/zefiro-cwl-parser/src/schema/document.rs (1)

116-118: Add test coverage for to_yaml method

The to_yaml method lacks test coverage. Consider adding tests for both successful serialization and error cases.

Would you like me to help draft test cases for the to_yaml method?

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 116-117: zefiro-core/zefiro-cwl-parser/src/schema/document.rs#L116-L117
Added lines #L116 - L117 were not covered by tests

zefiro-core/zefiro-cwl-parser/src/schema/command_line_tool.rs (7)

1-4: LGTM! Clean and well-organized imports.

The imports are minimal and well-structured, with good separation between internal crate dependencies and external ones.


29-38: LGTM! Well-structured input parameter implementation.

The struct contains all necessary fields for the MVP version, with proper use of raw identifiers and optional fields.


45-52: LGTM! Clean output parameter implementation.

The struct is well-designed for the MVP version, containing all necessary fields while maintaining simplicity.


75-81: LGTM! Clear and concise output binding implementation.

The struct is well-documented and implements the necessary output binding functionality for the MVP version.


59-68: ⚠️ Potential issue

Fix position type to match CWL spec

The position field should be i32 instead of u32 to match the CWL specification, which allows for negative positions.

Apply this change:

 pub struct CommandLineBinding {
     #[serde(skip_serializing_if = "Option::is_none")]
-    pub position: Option<u32>,
+    pub position: Option<i32>,

Likely invalid or redundant comment.


11-22: ⚠️ Potential issue

Add validation for the class field and make id optional

The struct has some deviations from the CWL spec:

  1. The class field should be validated to only accept "CommandLineTool"
  2. The id field should be optional

Here's the suggested implementation:

 #[serde(rename_all = "camelCase")]
 pub struct CommandLineTool {
     pub cwl_version: String,
-    pub class: String,
+    #[serde(default = "default_class")]
+    pub class: String,
     #[serde(skip_serializing_if = "Option::is_none")]
     pub doc: Option<Documentation>,
-    pub id: String,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub id: Option<String>,
     #[serde(skip_serializing_if = "Option::is_none")]
     pub label: Option<String>,
     pub inputs: Vec<CommandInputParameter>,
     pub outputs: Vec<CommandOutputParameter>,
     pub requirements: Vec<CommandLineToolRequirement>,
 }

+fn default_class() -> String {
+    "CommandLineTool".to_string()
+}

Likely invalid or redundant comment.


21-21: ⚠️ Potential issue

Add Docker requirement validation

The struct is missing the Docker requirement validation that was present in previous versions. This is important for ensuring that all CommandLineTools specify a container image.

Add the following implementation:

+impl CommandLineTool {
+    fn has_docker_requirement(&self) -> bool {
+        self.requirements.iter().any(|req| matches!(req, CommandLineToolRequirement::DockerRequirement { .. }))
+    }
+}
+
+impl Serialize for CommandLineTool {
+    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
+    where
+        S: serde::Serializer,
+    {
+        if !self.has_docker_requirement() {
+            return Err(serde::ser::Error::custom(
+                "CommandLineTool must have DockerRequirement in its requirements list. \
+                 Please add a DockerRequirement specifying the container image.",
+            ));
+        }
+        // Use the derived serialize implementation
+        #[derive(Serialize)]
+        struct Helper<'a>(#[serde(with = "self")] &'a CommandLineTool);
+        Helper(self).serialize(serializer)
+    }
+}

Likely invalid or redundant comment.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (5)
zefiro-core/zefiro-cwl-parser/src/values/document.rs (4)

12-17: Enhance struct documentation with more details.

While the current documentation provides a basic description, consider adding:

  • Example usage
  • Description of the flattened serialization behavior
  • Common use cases or scenarios
 /// Represents a collection of CWL input and output values as key-value pairs
+/// 
+/// # Examples
+/// 
+/// ```
+/// use zefiro_cwl_parser::values::document::CwlValues;
+/// 
+/// let yaml_input = r#"
+/// message: "Hello"
+/// number: 42
+/// "#;
+/// 
+/// let values = CwlValues::from_string(yaml_input).unwrap();
+/// assert!(values.contains_key("message"));
+/// ```
+/// 
+/// The internal HashMap is flattened during serialization, meaning nested structures
+/// are serialized at the root level of the YAML document.
 #[derive(Clone, Debug, Deserialize, Serialize)]
 pub struct CwlValues {

75-75: Fix typo in error message.

There's a typo in "dserialize".

-            .map_err(|e| Error::msg(format!("Failed to dserialize CWL values to string: {}", e)))
+            .map_err(|e| Error::msg(format!("Failed to serialize CWL values to string: {}", e)))

91-95: Improve example by checking the result.

The example should demonstrate proper error handling.

 let values = CwlValues::from_string(yaml_input).expect("Failed to serialize CWL values document");
 let mut tmpfile = tempfile::tempfile().unwrap();
 let mut writer = BufWriter::new(tmpfile);
-values.to_yaml(writer);
+values.to_yaml(writer).expect("Failed to write CWL values to file");

101-121: Enhance test coverage with additional test cases.

Consider adding:

  1. Negative test cases for invalid YAML
  2. Edge cases (empty document, large files)
  3. Verification of error messages

Example test cases:

#[test]
fn test_from_string_invalid_yaml() {
    let result = CwlValues::from_string("foo: [bar: baz}");
    assert!(result.is_err());
    assert!(result.unwrap_err().to_string().contains("Failed to deserialize"));
}

#[test]
fn test_from_path_nonexistent_file() {
    let result = CwlValues::from_path("nonexistent.yml");
    assert!(result.is_err());
    assert!(result.unwrap_err().to_string().contains("Failed to open file"));
}
zefiro-core/zefiro-cwl-parser/src/schema/document.rs (1)

31-34: Consider adding error context for file operations.

While the implementation is correct, consider adding context to file operation errors for better debugging.

-        let reader = BufReader::new(File::open(path)?);
+        let file = File::open(path)
+            .map_err(|e| anyhow::anyhow!("Failed to open file '{}': {}", path, e))?;
+        let reader = BufReader::new(file);
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 43cd16d and bca3551.

📒 Files selected for processing (2)
  • zefiro-core/zefiro-cwl-parser/src/schema/document.rs (1 hunks)
  • zefiro-core/zefiro-cwl-parser/src/values/document.rs (1 hunks)
🧰 Additional context used
🪛 GitHub Check: codecov/patch
zefiro-core/zefiro-cwl-parser/src/values/document.rs

[warning] 96-97: zefiro-core/zefiro-cwl-parser/src/values/document.rs#L96-L97
Added lines #L96 - L97 were not covered by tests

zefiro-core/zefiro-cwl-parser/src/schema/document.rs

[warning] 116-117: zefiro-core/zefiro-cwl-parser/src/schema/document.rs#L116-L117
Added lines #L116 - L117 were not covered by tests

🔇 Additional comments (7)
zefiro-core/zefiro-cwl-parser/src/values/document.rs (2)

19-25: LGTM! Clean and idiomatic Deref implementation.

The Deref implementation provides ergonomic access to the underlying HashMap while maintaining encapsulation.


96-98: Add test coverage for the to_yaml method.

The current test in test_save_schema_to_yaml doesn't verify the written content.

Consider adding a test that verifies the serialized output:

#[test]
fn test_to_yaml_content() {
    let input = r#"
    message: "Hello"
    number: 42
    "#;
    let values = CwlValues::from_string(input).unwrap();
    
    let mut output = Vec::new();
    values.to_yaml(&mut output).unwrap();
    
    let output_str = String::from_utf8(output).unwrap();
    assert!(output_str.contains("message: Hello"));
    assert!(output_str.contains("number: 42"));
}
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 96-97: zefiro-core/zefiro-cwl-parser/src/values/document.rs#L96-L97
Added lines #L96 - L97 were not covered by tests

zefiro-core/zefiro-cwl-parser/src/schema/document.rs (5)

1-9: Split the import statements into multiple lines for better readability.

The pipeline reports a formatting issue indicating that these chained imports need splitting.


13-19: Well-structured enum definition with appropriate attributes!

The CwlSchema enum is well-documented and correctly uses #[serde(untagged)] for flexible deserialization of CWL documents.


37-53: Robust implementation with good error handling!

The from_yaml method includes thorough validation of CWL version and class, with clear error messages.


121-127: Clean FromStr implementation!

Good use of delegation to the existing from_yaml method.


11-11: Consider supporting additional CWL versions.

The SUPPORTED_VERSIONS constant only includes "v1.2". Consider whether support for other CWL versions (e.g., v1.0, v1.1) would be beneficial for broader compatibility.

zefiro-core/zefiro-cwl-parser/src/schema/document.rs Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (9)
zefiro-core/zefiro-cwl-parser/src/values/document.rs (5)

19-25: Add documentation for the Deref implementation.

While the implementation is correct, adding documentation explaining why Deref is implemented would help future maintainers understand the design decision.

+/// Implements Deref to allow direct access to the underlying HashMap operations,
+/// making CwlValues behave like a HashMap while maintaining type safety.
 impl Deref for CwlValues {

72-76: Fix typo in error message.

There's a typo in the error message: "dserialize" should be "serialize".

-            .map_err(|e| Error::msg(format!("Failed to dserialize CWL values to string: {}", e)))
+            .map_err(|e| Error::msg(format!("Failed to serialize CWL values to string: {}", e)))

78-95: Improve the example for to_yaml.

The current example doesn't handle the Result returned by to_yaml.

 let values = CwlValues::from_string(yaml_input).expect("Failed to serialize CWL values document");
 let mut tmpfile = tempfile::tempfile().unwrap();
 let mut writer = BufWriter::new(tmpfile);
-values.to_yaml(writer);
+values.to_yaml(writer).expect("Failed to write YAML to file");

101-152: Consider adding negative test cases.

The current tests cover the happy path well, but adding tests for error conditions would improve robustness:

  • Invalid YAML syntax
  • Missing required fields
  • Invalid file paths
  • I/O errors

Would you like help drafting these additional test cases?


1-152: Consider architectural improvements for better error handling and construction.

Two suggestions to enhance the codebase:

  1. Implement a custom error type instead of using anyhow::Error directly. This would provide more specific error handling and better documentation of possible failure modes.
  2. Consider adding a builder pattern for constructing complex CwlValues instances programmatically.

Would you like help implementing either of these suggestions?

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 96-97: zefiro-core/zefiro-cwl-parser/src/values/document.rs#L96-L97
Added lines #L96 - L97 were not covered by tests

zefiro-core/zefiro-cwl-parser/src/schema/document.rs (4)

11-11: Document version support strategy.

Consider adding documentation about:

  • Why only v1.2 is supported
  • Plans for supporting other versions
  • Migration path for users with different versions

31-34: Enhance error handling in from_path.

Consider wrapping the errors with more context:

 pub fn from_path(path: &str) -> Result<Self> {
-    let reader = BufReader::new(File::open(path)?);
-    Self::from_yaml(serde_yaml::from_reader(reader)?)
+    let file = File::open(path)
+        .map_err(|e| anyhow::anyhow!("Failed to open file '{}': {}", path, e))?;
+    let reader = BufReader::new(file);
+    Self::from_yaml(serde_yaml::from_reader(reader)
+        .map_err(|e| anyhow::anyhow!("Failed to parse YAML from '{}': {}", path, e))?)
 }

121-127: Enhance FromStr error handling.

Consider adding more context to the error:

 fn from_str(s: &str) -> Result<Self> {
-    Self::from_yaml(serde_yaml::from_str(s)?)
+    serde_yaml::from_str(s)
+        .map_err(|e| anyhow::anyhow!("Failed to parse string as YAML: {}", e))
+        .and_then(Self::from_yaml)
 }

129-185: Enhance test coverage with additional cases.

Consider adding tests for:

  • Invalid CWL versions
  • Unsupported document classes
  • Malformed YAML
  • Empty files
  • Files with missing required fields

Example test cases:

#[test]
fn test_unsupported_version() {
    let yaml = r#"
        cwlVersion: v1.0
        class: CommandLineTool
        id: step
    "#;
    assert!(matches!(
        CwlSchema::from_string(yaml),
        Err(e) if e.to_string().contains("Unsupported CWL version")
    ));
}

#[test]
fn test_unsupported_class() {
    let yaml = r#"
        cwlVersion: v1.2
        class: Expression
        id: step
    "#;
    assert!(matches!(
        CwlSchema::from_string(yaml),
        Err(e) if e.to_string().contains("Unsupported CWL document class")
    ));
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bca3551 and 01f2db8.

📒 Files selected for processing (2)
  • zefiro-core/zefiro-cwl-parser/src/schema/document.rs (1 hunks)
  • zefiro-core/zefiro-cwl-parser/src/values/document.rs (1 hunks)
🧰 Additional context used
🪛 GitHub Check: codecov/patch
zefiro-core/zefiro-cwl-parser/src/values/document.rs

[warning] 96-97: zefiro-core/zefiro-cwl-parser/src/values/document.rs#L96-L97
Added lines #L96 - L97 were not covered by tests

zefiro-core/zefiro-cwl-parser/src/schema/document.rs

[warning] 116-117: zefiro-core/zefiro-cwl-parser/src/schema/document.rs#L116-L117
Added lines #L116 - L117 were not covered by tests

🔇 Additional comments (4)
zefiro-core/zefiro-cwl-parser/src/values/document.rs (2)

1-17: Well-structured imports and struct definition!

The code demonstrates good practices with:

  • Appropriate use of imports
  • Clear documentation
  • Correct use of derive macros
  • Smart use of serde flattening for clean serialization

96-98: Add test coverage for the to_yaml method.

While there is an integration test in test_cwlvalues_to_yaml, direct unit tests for edge cases would be valuable.

Would you like help drafting unit tests that exercise error cases for to_yaml?

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 96-97: zefiro-core/zefiro-cwl-parser/src/values/document.rs#L96-L97
Added lines #L96 - L97 were not covered by tests

zefiro-core/zefiro-cwl-parser/src/schema/document.rs (2)

116-118: Improve test coverage for to_yaml method.

The to_yaml method's error handling isn't covered by tests. Consider adding test cases for:

  • Writing to a failing writer (e.g., a full disk simulation)
  • Invalid YAML serialization cases

The codecov report confirms this gap in coverage.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 116-117: zefiro-core/zefiro-cwl-parser/src/schema/document.rs#L116-L117
Added lines #L116 - L117 were not covered by tests


13-19: Verify untagged enum serialization behavior.

The #[serde(untagged)] attribute means the enum variant will be determined by attempting to match the fields rather than using a "type" field. This could lead to ambiguous cases if CommandLineTool and Workflow have similar fields.

Run this script to check for potential field overlaps:

✅ Verification successful

Let me try a different search pattern since the previous one didn't yield results.


Let me check one more thing to verify how these types are distinguished in CWL.


Based on the analysis of the structs and error handling code, I can now provide the final response:

#[serde(untagged)] usage is appropriate and safe

The untagged enum serialization is safe in this case because:

  • Both types have a distinct class field that serves as a discriminator
  • The structs have different field combinations (CommandLineTool has CommandInputParameter/CommandOutputParameter while Workflow has WorkflowStep)
  • The error handling code in document.rs shows there's validation of the class field to ensure proper type discrimination
  • The field structures are sufficiently different to prevent ambiguous matching
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for struct definitions to analyze field overlap
echo "CommandLineTool fields:"
ast-grep --pattern 'struct CommandLineTool {
  $$$
}'

echo -e "\nWorkflow fields:"
ast-grep --pattern 'struct Workflow {
  $$$
}'

Length of output: 210


Script:

#!/bin/bash
# Search for struct definitions in the entire codebase
echo "Looking for CommandLineTool struct:"
rg "struct CommandLineTool" -A 10

echo -e "\nLooking for Workflow struct:"
rg "struct Workflow" -A 10

Length of output: 6169


Script:

#!/bin/bash
# Look for class field values to confirm type discrimination
echo "Searching for class field assignments or validations:"
rg "class.*=.*\".*\"" -A 1
rg "class.*=.*CommandLineTool" -A 1
rg "class.*=.*Workflow" -A 1

Length of output: 635

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (7)
zefiro-core/zefiro-cwl-parser/src/values/document.rs (4)

12-17: Enhance struct documentation with more details.

While the current documentation is good, consider adding more details about:

  • The expected format of the key-value pairs
  • Common use cases
  • Examples of valid CWL values
 /// Represents a collection of CWL input and output values as key-value pairs
+/// 
+/// # Examples
+/// 
+/// ```
+/// # use zefiro_cwl_parser::values::document::CwlValues;
+/// let yaml_input = r#"
+/// message: "Hello World"
+/// input_file:
+///   class: File
+///   location: "s3://bucket/input.txt"
+/// "#;
+/// let values = CwlValues::from_string(yaml_input).unwrap();
+/// ```

74-75: Fix typo in error message.

There's a typo in the error message: "dserialize" should be "deserialize".

-            .map_err(|e| Error::msg(format!("Failed to dserialize CWL values to string: {}", e)))
+            .map_err(|e| Error::msg(format!("Failed to deserialize CWL values to string: {}", e)))

80-95: Improve doc example for to_yaml method.

The example code doesn't handle the Result returned by to_yaml, which could mask potential errors.

 let values = CwlValues::from_string(yaml_input).expect("Failed to serialize CWL values document");
 let mut tmpfile = tempfile::tempfile().unwrap();
 let mut writer = BufWriter::new(tmpfile);
-values.to_yaml(writer);
+values.to_yaml(writer).expect("Failed to write YAML");

101-131: Enhance test coverage and maintainability.

Consider the following improvements to the test suite:

  1. Add test cases for error scenarios (e.g., invalid YAML, file not found)
  2. Make test data paths configurable through environment variables or test constants
  3. Add tests for empty documents and edge cases

Here's an example of additional test cases:

#[test]
fn test_cwlvalues_from_path_not_found() {
    let result = CwlValues::from_path("non_existent_file.yml");
    assert!(result.is_err());
    assert!(result.unwrap_err().to_string().contains("Failed to open file"));
}

#[test]
fn test_cwlvalues_from_string_invalid_yaml() {
    let result = CwlValues::from_string("invalid: : yaml: :");
    assert!(result.is_err());
}

#[test]
fn test_cwlvalues_empty_document() {
    let result = CwlValues::from_string("{}").unwrap();
    assert_eq!(result.values.len(), 0);
}
zefiro-core/zefiro-cwl-parser/src/schema/document.rs (3)

11-11: Consider using a more specific type for version constants.

Instead of using a slice of string slices, consider using a const array for better type safety and zero-cost abstraction.

-const SUPPORTED_VERSIONS: &[&str] = &["v1.2"];
+const SUPPORTED_VERSIONS: [&str; 1] = ["v1.2"];

37-53: Consider using const string comparison for version check.

The version comparison could be more efficient using const string comparison.

-ensure!(
-    SUPPORTED_VERSIONS.contains(&version),
-    "Unsupported CWL version: {version}"
-);
+ensure!(
+    SUPPORTED_VERSIONS.iter().any(|&v| v.eq_ignore_ascii_case(version)),
+    "Unsupported CWL version: {version}"
+);

129-161: Add tests for error cases and edge cases.

While the happy path is well tested, consider adding tests for:

  1. Invalid CWL versions
  2. Unsupported document classes
  3. Malformed YAML
  4. File not found errors

Example test cases:

#[test]
fn test_invalid_cwl_version() {
    let yaml = r#"
        cwlVersion: v2.0
        class: CommandLineTool
        id: step
    "#;
    assert!(matches!(
        CwlSchema::from_string(yaml).unwrap_err().to_string(),
        s if s.contains("Unsupported CWL version")
    ));
}

#[test]
fn test_unsupported_class() {
    let yaml = r#"
        cwlVersion: v1.2
        class: UnsupportedType
        id: step
    "#;
    assert!(matches!(
        CwlSchema::from_string(yaml).unwrap_err().to_string(),
        s if s.contains("Unsupported CWL document class")
    ));
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 01f2db8 and b7d2389.

📒 Files selected for processing (2)
  • zefiro-core/zefiro-cwl-parser/src/schema/document.rs (1 hunks)
  • zefiro-core/zefiro-cwl-parser/src/values/document.rs (1 hunks)
🧰 Additional context used
🪛 GitHub Check: codecov/patch
zefiro-core/zefiro-cwl-parser/src/values/document.rs

[warning] 96-97: zefiro-core/zefiro-cwl-parser/src/values/document.rs#L96-L97
Added lines #L96 - L97 were not covered by tests

zefiro-core/zefiro-cwl-parser/src/schema/document.rs

[warning] 116-117: zefiro-core/zefiro-cwl-parser/src/schema/document.rs#L116-L117
Added lines #L116 - L117 were not covered by tests

🔇 Additional comments (6)
zefiro-core/zefiro-cwl-parser/src/values/document.rs (2)

19-25: Clean Deref implementation!

The Deref implementation provides ergonomic access to the underlying HashMap while maintaining encapsulation.


96-98: Add test coverage for error cases in to_yaml.

The current test only covers the success path. Consider adding tests for error cases, such as writing to a failed writer.

Would you like help drafting additional test cases for error scenarios?

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 96-97: zefiro-core/zefiro-cwl-parser/src/values/document.rs#L96-L97
Added lines #L96 - L97 were not covered by tests

zefiro-core/zefiro-cwl-parser/src/schema/document.rs (4)

1-9: LGTM! Imports are properly organized.

The imports are now correctly formatted and grouped according to Rust conventions.


13-19: LGTM! Well-designed enum with appropriate derives.

The enum is well-documented and correctly uses untagged serde representation for flexible deserialization of CWL documents.


27-27: Update example paths to be relative to the crate root.

The doc examples use incorrect relative paths. They should be relative to the crate root.

-let yaml_file = "examples/data/clt-step-schema.yml";
+let yaml_file = "zefiro-core/zefiro-cwl-parser/examples/data/clt-step-schema.yml";

Also applies to: 110-110


121-127: LGTM! Clean FromStr implementation.

The implementation correctly delegates to from_yaml and follows Rust error handling patterns.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
zefiro-core/zefiro-cwl-parser/src/schema/requirements.rs (1)

40-57: 🛠️ Refactor suggestion

Add maximum resource fields from CWL spec

Consider adding maximum resource fields:

  • coresMax
  • ramMax
  • tmpdirMax
  • outdirMax
🧹 Nitpick comments (10)
zefiro-core/zefiro-cwl-parser/src/schema/workflow.rs (3)

18-19: Consider validation for the id field.

The id field uses #[serde(default)] which defaults to an empty string. Consider making this Option<String> instead, as an empty string might not be a valid identifier in CWL.

-    #[serde(default)]
-    pub id: String,
+    pub id: Option<String>,

24-24: Consider validating workflow requirements for duplicates.

The requirements vector could potentially contain duplicate requirements. Consider adding validation during deserialization to ensure uniqueness.

You could implement custom deserialization or add a validation method:

impl Workflow {
    pub fn validate(&self) -> Result<(), String> {
        let mut seen = std::collections::HashSet::new();
        for req in &self.requirements {
            if !seen.insert(req.get_type()) {
                return Err(format!("Duplicate requirement type: {:?}", req.get_type()));
            }
        }
        Ok(())
    }
}

82-82: Enhance type safety for scatter_method.

Consider using an enum for scatter_method to restrict values to valid options: "dotproduct", "nested_crossproduct", "flat_crossproduct".

#[derive(Clone, Debug, Deserialize, Serialize)]
#[serde(rename_all = "snake_case")]
pub enum ScatterMethod {
    DotProduct,
    NestedCrossProduct,
    FlatCrossProduct,
}

Then update the field:

-    pub scatter_method: Option<String>,
+    pub scatter_method: Option<ScatterMethod>,
zefiro-core/zefiro-cwl-parser/src/schema/requirements.rs (4)

4-4: Consider supporting multiple CWL versions

The constant SUPPORTED_CWL_VERSIONS only includes v1.2. Consider adding support for earlier versions (v1.0, v1.1) to ensure broader compatibility with existing CWL workflows.


11-18: Add documentation link and consider additional workflow requirements

  1. Add the CWL specification link in the documentation comment.
  2. Consider implementing additional workflow requirements from the CWL spec:
  • SubworkflowFeatureRequirement
  • MultipleInputFeatureRequirement
  • StepInputExpressionRequirement

20-29: Add documentation link and consider additional command line tool requirements

  1. Add the CWL specification link in the documentation comment.
  2. Consider implementing additional requirements from the CWL spec:
  • InitialWorkDirRequirement
  • EnvVarRequirement
  • ShellCommandRequirement

101-108: Consider adding validation for WorkReuse

Consider implementing validation to ensure that when WorkReuse is specified, enable_reuse is true. This would prevent the redundant specification of WorkReuse with enable_reuse set to false.

Example implementation:

impl WorkReuse {
    pub fn validate(&self) -> Result<(), &'static str> {
        if !self.enable_reuse {
            return Err("WorkReuse specified but enable_reuse is false");
        }
        Ok(())
    }
}
zefiro-core/zefiro-cwl-parser/src/schema/document.rs (3)

94-97: Enhance error message in from_string.

The current error message wraps the underlying error but doesn't guide users on potential fixes. Consider adding common failure reasons to help users debug issues.

-            .map_err(|e| Error::msg(format!("Failed to parse CWL schema from string: {}", e)))
+            .map_err(|e| Error::msg(format!(
+                "Failed to parse CWL schema from string. Ensure the string contains valid YAML and includes required 'cwlVersion' and 'class' fields. Error: {}",
+                e
+            )))

104-115: Improve doc example to show error handling.

The example in the documentation doesn't demonstrate error handling and uses unwrap(). Consider showing proper error handling practices.

 /// ```
 /// use zefiro_cwl_parser::schema::document::CwlSchema;
 /// use std::fs::File;
 /// use std::io::BufWriter;
 ///
 /// let yaml_file = "examples/data/clt-step-schema.yml";
-/// let schema = CwlSchema::from_path(yaml_file).expect("Failed to serialize CWL schema document");
-/// let mut tmpfile = tempfile::tempfile().unwrap();
-/// let mut writer = BufWriter::new(tmpfile);
-/// schema.to_yaml(writer);
+/// # fn main() -> anyhow::Result<()> {
+/// let schema = CwlSchema::from_path(yaml_file)?;
+/// let tmpfile = tempfile::tempfile()?;
+/// let writer = BufWriter::new(tmpfile);
+/// schema.to_yaml(writer)?;
+/// # Ok(())
+/// # }
 /// ```

129-197: Add negative test cases for validation.

The test suite covers the happy path and write errors well, but could benefit from negative test cases for:

  1. Unsupported CWL versions
  2. Invalid class values
  3. Missing required fields

Here's a suggested test implementation:

#[test]
fn test_validation_errors() {
    // Test unsupported version
    let yaml_str = r#"
        cwlVersion: v0.1
        class: CommandLineTool
        id: step
    "#;
    assert!(CwlSchema::from_string(yaml_str).is_err());

    // Test invalid class
    let yaml_str = r#"
        cwlVersion: v1.2
        class: InvalidClass
        id: step
    "#;
    assert!(CwlSchema::from_string(yaml_str).is_err());

    // Test missing required fields
    let yaml_str = r#"
        id: step
    "#;
    assert!(CwlSchema::from_string(yaml_str).is_err());
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b7d2389 and 1fc7002.

📒 Files selected for processing (4)
  • zefiro-core/zefiro-cwl-parser/src/schema/command_line_tool.rs (1 hunks)
  • zefiro-core/zefiro-cwl-parser/src/schema/document.rs (1 hunks)
  • zefiro-core/zefiro-cwl-parser/src/schema/requirements.rs (1 hunks)
  • zefiro-core/zefiro-cwl-parser/src/schema/workflow.rs (1 hunks)
🧰 Additional context used
📓 Learnings (3)
zefiro-core/zefiro-cwl-parser/src/schema/requirements.rs (1)
Learnt from: nsyzrantsev
PR: zefiroproj/zefiro#4
File: zefiro-core/zefiro-cwl-parser/src/schema/requirements.rs:83-85
Timestamp: 2025-01-04T16:02:13.319Z
Learning: For the `ToolTimeLimit.timelimit` field in `zefiro-core/zefiro-cwl-parser/src/schema/requirements.rs`, the user indicated that the type can be either a string or a u32.
zefiro-core/zefiro-cwl-parser/src/schema/command_line_tool.rs (2)
Learnt from: nsyzrantsev
PR: zefiroproj/zefiro#4
File: zefiro-core/zefiro-cwl-parser/src/schema/command_line_tool.rs:52-59
Timestamp: 2025-01-04T17:12:06.568Z
Learning: The user indicated that fields like `label`, `secondaryFiles`, `streamable`, `doc`, `format`, `loadContents`, and `loadListing` are not required for the MVP version of the CommandOutputParameter struct.
Learnt from: nsyzrantsev
PR: zefiroproj/zefiro#4
File: zefiro-core/zefiro-cwl-parser/src/schema/command_line_tool.rs:1-81
Timestamp: 2025-01-04T19:10:32.963Z
Learning: Full validation of CWL files and related tests will be implemented in the `zefiro-cwl-validator` crate, rather than in the `zefiro-cwl-parser` crate.
zefiro-core/zefiro-cwl-parser/src/schema/workflow.rs (1)
Learnt from: nsyzrantsev
PR: zefiroproj/zefiro#4
File: zefiro-core/zefiro-cwl-parser/src/schema/workflow.rs:61-70
Timestamp: 2025-01-04T16:22:22.394Z
Learning: In `zefiro-core/zefiro-cwl-parser/src/schema/workflow.rs`, the user explicitly wants the `WorkflowStep::run` field to support only `CommandLineTool`.
🪛 GitHub Check: codecov/patch
zefiro-core/zefiro-cwl-parser/src/schema/document.rs

[warning] 116-117: zefiro-core/zefiro-cwl-parser/src/schema/document.rs#L116-L117
Added lines #L116 - L117 were not covered by tests

🔇 Additional comments (16)
zefiro-core/zefiro-cwl-parser/src/schema/workflow.rs (3)

37-47: LGTM! Well-structured input parameter implementation.

The WorkflowInputParameter struct is well-documented and correctly implements the CWL specification.


49-67: LGTM! Clean output parameter implementation with flexible output source handling.

The WorkflowOutputParameter and WorkflowOutputParameterOutputSource implementations are well-structured and provide good flexibility for handling both single and array output sources.


85-85: Fix incorrect documentation comment.

The comment incorrectly states it defines the input parameters of the workflow step (out section). It should reference the in section instead.

-/// Defines the input parameters of the workflow step (`out` section).
+/// Defines the input parameters of the workflow step (`in` section).
zefiro-core/zefiro-cwl-parser/src/schema/requirements.rs (4)

31-38: LGTM!

The implementation is clean and includes proper documentation with the spec link.


74-78: LGTM!

The implementation correctly follows the CWL spec and includes proper documentation.


80-93: LGTM!

The implementation correctly handles both numeric and expression-based time limits as per CWL spec, using an untagged enum for flexible parsing.


95-99: LGTM!

The implementation correctly follows the CWL spec and includes proper documentation.

zefiro-core/zefiro-cwl-parser/src/schema/command_line_tool.rs (7)

1-5: LGTM!

The imports are well-organized and correctly specify the required dependencies for CWL schema implementation.


6-28: LGTM!

The CommandLineTool struct is well-documented and correctly implements the CWL specification with appropriate derive macros and serde attributes.


40-54: LGTM!

The CommandInputParameter struct correctly implements the essential fields required for the MVP version.


56-68: LGTM!

The CommandOutputParameter struct correctly implements the essential fields required for the MVP version.


86-97: LGTM!

The OutputBinding struct correctly implements the essential fields required for the MVP version.


76-77: 🛠️ Refactor suggestion

Change position type to i32

According to the CWL specification, the position field should be an integer that can be negative. Currently, it's u32 which only allows positive numbers.

     #[serde(skip_serializing_if = "Option::is_none")]
-    pub position: Option<u32>,
+    pub position: Option<i32>,

Likely invalid or redundant comment.


18-19: 🛠️ Refactor suggestion

Make id field optional

According to the CWL specification, the id field is optional. Currently, it defaults to an empty string, which might not be the best representation of an optional field.

-    #[serde(default)]
-    pub id: String,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub id: Option<String>,

Likely invalid or redundant comment.

zefiro-core/zefiro-cwl-parser/src/schema/document.rs (2)

121-127: LGTM! Clean and efficient implementation.

The FromStr implementation correctly reuses the from_yaml method, maintaining consistency in parsing logic.


13-19: Consider potential deserialization ambiguity with untagged enum.

The #[serde(untagged)] attribute means serde will try to deserialize into each variant until one succeeds. While the class field check in from_yaml prevents ambiguity, direct deserialization through serde (e.g., from_string) might be ambiguous if a document could match both variants.

Let's verify if there are any cases where a document could be ambiguously parsed:

✅ Verification successful

The untagged enum deserialization is safe due to class field validation

After analyzing the code, I can confirm that the deserialization is safe. The CwlSchema enum uses two approaches for parsing:

  1. The primary from_yaml method explicitly checks the class field before attempting deserialization, ensuring unambiguous parsing:
match value.get("class").and_then(Value::as_str) {
  1. The from_string and FromStr implementations both ultimately use from_yaml, inheriting its safety guarantees.

Both CommandLineTool and Workflow structs have distinct required fields and structures, and the class field validation ensures correct variant selection, making the #[serde(untagged)] attribute safe to use in this context.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for CommandLineTool and Workflow struct definitions to analyze field overlap
ast-grep --pattern 'struct CommandLineTool {$$$}'
ast-grep --pattern 'struct Workflow {$$$}'

Length of output: 96


Script:

#!/bin/bash
# Let's try a different approach to find the struct definitions
rg "struct (CommandLineTool|Workflow)" -A 20

Length of output: 9041


Script:

#!/bin/bash
# Let's check how the class field is used in deserialization
rg "class.*=.*\"(Workflow|CommandLineTool)\"" -B 2 -A 2

Length of output: 55


Script:

#!/bin/bash
# Let's check the implementation of from_yaml or similar parsing methods
rg "fn from_yaml|fn from_str" -A 10

Length of output: 3836

@nsyzrantsev nsyzrantsev merged commit 2527839 into main Jan 5, 2025
7 of 8 checks passed
@nsyzrantsev nsyzrantsev deleted the 3-develop-a-crate-to-parse-cwl-documents branch January 5, 2025 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Develop a crate to parse CWL documents
1 participant