Skip to content

Commit

Permalink
Merge pull request #2 from cashfree/update_readme_1
Browse files Browse the repository at this point in the history
updating the read me
  • Loading branch information
dineshcashfree authored Nov 25, 2024
2 parents e1b5a2a + d660308 commit d460638
Show file tree
Hide file tree
Showing 2 changed files with 26 additions and 17 deletions.
41 changes: 25 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@ FileGenie SDK is a Python library designed to simplify parsing files from AWS S3
- **Multi-format Support:** Effortlessly parse files in formats such as TEXT, CSV, EXCEL, ZIP, XML, and PDF directly from AWS S3.
- **Flexible Response Types:** Generate responses tailored to user needs, including DATAFRAME, JSON, or FILE outputs.
- **Password-Protected Files:** Seamlessly parse files secured with passwords.
- **Custom Edge Case Handling:** Apply user-defined custom functions to manage specific parsing and transformation needs, including data sanitization, value conversions, or reformatting date fields for consistency.
AWS S3 Integration: Fetch files directly from AWS S3 buckets using IAM roles for secure access.
Streamlined Configuration: Set up easily with minimal configuration, eliminating the need of writing parser for specific file type.
- **Custom Edge Case Handling:** Apply user-defined custom functions to address specific data massaging and transformation requirements, such as sanitizing data, converting values, reformatting date fields etc.
- **AWS S3 Integration:** Fetch files directly from AWS S3 buckets using IAM roles for secure access.
- **Streamlined Configuration:** Set up easily with minimal configuration, eliminating the need of writing parser for specific file type.

### Installation
Install the SDK using pip:
Expand All @@ -21,7 +21,7 @@ pip install file_genie
- **Pandas:** '2.0.0'

### Getting Started
- **Define Custom Edge Cases:**
**Define Custom Edge Cases:**
Let's say you need to sanitize columns (e.g., standardise column values to a common format before applying custom logic) during file parsing, you can define custom functions for the SDK to use.

To implement this:
Expand All @@ -37,11 +37,11 @@ from edgeCases import user_edge_cases
self.edge_cases = user_edge_cases
```

- **Define the configuration required for file parsing logic and S3 bucket names**
**Define the configuration required for file parsing logic and S3 bucket names**
```
s3_config: {
upload_bucket: reconciliation-live
download_bucket: reconciliation-live
upload_bucket: s3_bucket_name
download_bucket: s3_bucket_name
}
file_config: {
"file_source_1": {
Expand Down Expand Up @@ -75,17 +75,26 @@ self.edge_cases = user_edge_cases
},
}
```
**read_from_s3_func:** This filed in FileGenie configuration specifies the function to be used for parsing a specific file type from AWS S3. Depending on the file format, you can choose from the following available functions:

- **Define a ParsedDataResponseType enum**
```
import enum
class ParsedDataResponseType(enum.Enum):
DATAFRAME="DATAFRAME"
FILE="FILE"
JSON="JSON"
```
- **readFromS3** - parse the TXT, EXCEL, CSV, XML, PDF files
- **readZipFromS3** - parse the zip files
- **read_complete_excel_file** - Use this function when working with EXCEL files containing multiple sheets.

**parameters_for_read_s3:** This field in FileGenie configuration specifies the additional parameters required for reading the file such as password_protected, password, sep etc. you can choose from the following available params:
- password_protected: If file is password protected or not
- passowrd_secret_key: Secret key name for password.
- skiprows: Rows to skip at the start.
- sep: Delimiter for CSV parsing.
- header: Row number(s) to use as column names.
- has_header: Specify if the file has a header.
- skip_header: Skip the header row during processing.
- sheet_name: Target sheet in an Excel file.
- parser_func: Custom parser function.
- chunksize: Number of rows to read per chunk.
- skip_footer: Rows to skip at the end.

- **Import and initialise the file genie**
**Import and initialise the file genie**
```
from file_genie import FileGenie
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

setup(
name="file_genie",
version="0.0.2",
version="0.0.3",
description="File Genie is designed to parse various file types and transform them according to provided configuration",
long_description=open("README.md", "r", encoding="utf-8").read(),
long_description_content_type="text/markdown",
Expand Down

0 comments on commit d460638

Please sign in to comment.