Fix Issues with Microsetta-Processing Scripts for American Gut Project Data #19
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
While initially running the TMI scripts to capture the American Gut Project microbial data, I had encountered runtime errors while running the scripts on barnacle. I encountered these issues while using the default/suggested system variables as provided on the github page.
Lucas (@lpatel) and I identified and resolved two bugs in the following files:
05c.beta.sh:
Corrected the input parameter for the
k_neighbors.py
script. Originally, it was usingweighted_unifrac.qza
as the distance matrix input. However, there was no other mention of this file elsewhere. I've updated this toweighted_normalized_unifrac.qza
to match with the expected data processing flow.metadata_operations.py:
Implemented a conditional update to address a situation where the
mapping
dictionary could be empty, when using the AGP studies(10317) as '$STUDIES' variable. This lead to an unnecessary update to the Biom table IDs when using the AGP data for our '$STUDIES'. This change should make sure that ID updates only occur when there's actually mapping data to apply, thus preventing unnecessary calls to the 'update_ids' functions.Affected Files:
scripts/05c.beta.sh
scripts/metadata_operations.py
Testing:
The fixes were tested in the same environment where the issues were initially encountered. Post-fix, the scripts ran successfully without any runtime errors, processing the American Gut Project data as expected.
Please review these changes for inclusion in the main branch to ensure a more reliable data processing experience for future users.