Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Issues with Microsetta-Processing Scripts for American Gut Project Data #19

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

l1joseph
Copy link

While initially running the TMI scripts to capture the American Gut Project microbial data, I had encountered runtime errors while running the scripts on barnacle. I encountered these issues while using the default/suggested system variables as provided on the github page.

Lucas (@lpatel) and I identified and resolved two bugs in the following files:

  1. 05c.beta.sh:
    Corrected the input parameter for the k_neighbors.py script. Originally, it was using weighted_unifrac.qza as the distance matrix input. However, there was no other mention of this file elsewhere. I've updated this to weighted_normalized_unifrac.qza to match with the expected data processing flow.

  2. metadata_operations.py:
    Implemented a conditional update to address a situation where the mapping dictionary could be empty, when using the AGP studies(10317) as '$STUDIES' variable. This lead to an unnecessary update to the Biom table IDs when using the AGP data for our '$STUDIES'. This change should make sure that ID updates only occur when there's actually mapping data to apply, thus preventing unnecessary calls to the 'update_ids' functions.

Affected Files:

  • scripts/05c.beta.sh
  • scripts/metadata_operations.py

Testing:
The fixes were tested in the same environment where the issues were initially encountered. Post-fix, the scripts ran successfully without any runtime errors, processing the American Gut Project data as expected.

Please review these changes for inclusion in the main branch to ensure a more reliable data processing experience for future users.

@lucaspatel
Copy link

Implemented a conditional update to address a situation where the mapping dictionary could be empty, when using the AGP studies(10317) as '$STUDIES' variable. This lead to an unnecessary update to the Biom table IDs when using the AGP data for our '$STUDIES'. This change should make sure that ID updates only occur when there's actually mapping data to apply, thus preventing unnecessary calls to the 'update_ids' functions.

More specifically, when $STUDIES == 10317, the mapping object is empty and the pipeline fails. This amends that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants