Skip to content
This repository has been archived by the owner on Dec 23, 2024. It is now read-only.

Commit

Permalink
style: reformat code and add doc tests (#12)
Browse files Browse the repository at this point in the history
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced new error variants for enhanced error handling in regex
operations.
- Added a structured logging system to improve progress tracking and
message clarity.
- Implemented new `FastqReader` and `FastqWriter` for improved FASTQ
file handling.

- **Improvements**
	- Renamed functions for better clarity in compression type handling.
- Refactored command-line argument handling for improved organization
and readability.
- Enhanced documentation with clearer installation instructions and
project description.

- **Bug Fixes**
- Updated logging mechanism to replace print statements with structured
logger usage.

- **Refactor**
- Major refactor of barcode handling logic, simplifying structures and
methods.
- Consolidated parameters in the command-line interface for better
organization.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
  • Loading branch information
nsyzrantsev authored Sep 2, 2024
1 parent e50eff6 commit 1fbc612
Show file tree
Hide file tree
Showing 12 changed files with 856 additions and 514 deletions.
2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ name = "barkit"
version = "0.1.0" # managed by release.sh
edition = "2021"
authors = ["Nikita Syzrantsev [email protected]"]
description = "Tool to process barcodes in FASTQ"
description = "BarKit — a cross-platform and ultrafast toolkit for barcodes manipulation in FASTQ files"
license = "GPL-3.0"
readme = "README.md"
homepage = "https://github.com/nsyzrantsev/barkit"
Expand Down
32 changes: 24 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,37 @@
# BarKit

> [!WARNING]
> This tool is under development. Please use the first release version when it becomes available.
BarKit (Barcodes Toolkit) is a toolkit designed for manipulating FASTQ barcodes.
BarKit (**Bar**codes Tool**Kit**) is a toolkit designed for manipulating FASTQ barcodes.

## Installation

### From crates.io

Barkit can be installed from [`crates.io`](https://crates.io/crates/barkit) using `cargo`. This can be done with the following command:

```bash
cargo install barkit
```

## Extract Command
### Build from source

1. Clone the repository:

```bash
git clone https://github.com/nsyzrantsev/barkit
cd barkit/
```

2. Build:

```bash
cargo build --release && sudo mv target/release/barkit /usr/local/bin/
```

## Extract subcommand

The extract command is designed to parse barcode sequences from FASTQ reads using approximate regex matching based on a provided pattern.
The extract subcommand is designed to parse barcode sequences from FASTQ reads using approximate regex matching based on a provided pattern.

All parsed barcode sequences are moved to the read header with base quality separated by colons:
All parsed barcode sequences are moved to the read header with base quality, separated by colons:

```
@SEQ_ID UMI:ATGC:???? CB:ATGC:???? SB:ATGC:????
Expand All @@ -34,7 +50,7 @@ Parse the first twelve nucleotides as a UMI from each forward read:
barkit extract -1 <IN_FASTQ1> -2 <IN_FASTQ2> -p "^(?P<UMI>[ATGCN]{12})" -o <OUT_FASTQ1> -O <OUT_FASTQ2>
```

Parse the first sixteen nucleotides as a cell barcode from each reverse read before the `atgccat` sequence:
Parse the first sixteen nucleotides as a cell barcode from each reverse read before the `atgccat` adapter sequence:

```bash
barkit extract -1 <IN_FASTQ1> -2 <IN_FASTQ2> -P "^(?P<CB>[ATGCN]{16})atgccat" -o <OUT_FASTQ1> -O <OUT_FASTQ2>
Expand Down
2 changes: 1 addition & 1 deletion barkit-extract/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ name = "barkit-extract"
version = "0.1.0" # managed by release.sh
edition = "2021"
authors = ["Nikita Syzrantsev [email protected]"]
description = "Tool to extract barcodes"
description = "Tool for extracting barcode nucleotide sequence according to a specified regex pattern"
license = "GPL-3.0"
readme = "../README.md"
homepage = "https://github.com/nsyzrantsev/barkit"
Expand Down
12 changes: 9 additions & 3 deletions barkit-extract/src/error.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,14 @@ pub enum Error {
BarcodeCaptureGroupNotFound(String),
#[error("Provided unexpected barcode capture group {0}")]
UnexpectedCaptureGroupName(String),
#[error("Failed to read a file: {0}")]
FileRead(#[from] std::io::Error),
#[error("I/O error: {0}")]
IO(#[from] std::io::Error),
#[error("No match")]
PatternNotMatched,
#[error("Fancy regex error: {0}")]
FancyRegex(#[from] fancy_regex::Error),
#[error("Failed to choose permutation mask")]
PermutationMaskSize,
}

impl Clone for Error {
Expand All @@ -30,8 +34,10 @@ impl Clone for Error {
Error::UnexpectedCaptureGroupName(capture_group) => {
Error::UnexpectedCaptureGroupName(capture_group.clone())
}
Error::FileRead(err) => Error::FileRead(err.kind().into()),
Error::IO(err) => Error::IO(err.kind().into()),
Error::PatternNotMatched => Error::PatternNotMatched,
Error::FancyRegex(err) => Error::FancyRegex(err.clone()),
Error::PermutationMaskSize => Error::PermutationMaskSize,
}
}
}
Loading

0 comments on commit 1fbc612

Please sign in to comment.