Fix Issues with Microsetta-Processing Scripts for American Gut Project Data #19

l1joseph · 2024-03-10T03:45:06Z

While initially running the TMI scripts to capture the American Gut Project microbial data, I had encountered runtime errors while running the scripts on barnacle. I encountered these issues while using the default/suggested system variables as provided on the github page.

Lucas (@lpatel) and I identified and resolved two bugs in the following files:

05c.beta.sh:
Corrected the input parameter for the k_neighbors.py script. Originally, it was using weighted_unifrac.qza as the distance matrix input. However, there was no other mention of this file elsewhere. I've updated this to weighted_normalized_unifrac.qza to match with the expected data processing flow.
metadata_operations.py:
Implemented a conditional update to address a situation where the mapping dictionary could be empty, when using the AGP studies(10317) as '$STUDIES' variable. This lead to an unnecessary update to the Biom table IDs when using the AGP data for our '$STUDIES'. This change should make sure that ID updates only occur when there's actually mapping data to apply, thus preventing unnecessary calls to the 'update_ids' functions.

Affected Files:

scripts/05c.beta.sh
scripts/metadata_operations.py

Testing:
The fixes were tested in the same environment where the issues were initially encountered. Post-fix, the scripts ran successfully without any runtime errors, processing the American Gut Project data as expected.

Please review these changes for inclusion in the main branch to ensure a more reliable data processing experience for future users.

lucaspatel · 2024-03-11T21:07:55Z

Implemented a conditional update to address a situation where the mapping dictionary could be empty, when using the AGP studies(10317) as '$STUDIES' variable. This lead to an unnecessary update to the Biom table IDs when using the AGP data for our '$STUDIES'. This change should make sure that ID updates only occur when there's actually mapping data to apply, thus preventing unnecessary calls to the 'update_ids' functions.

More specifically, when $STUDIES == 10317, the mapping object is empty and the pipeline fails. This amends that.

fixed bugs relating to qiita 10317

810f1fb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Issues with Microsetta-Processing Scripts for American Gut Project Data #19

Fix Issues with Microsetta-Processing Scripts for American Gut Project Data #19

l1joseph commented Mar 10, 2024

lucaspatel commented Mar 11, 2024

Fix Issues with Microsetta-Processing Scripts for American Gut Project Data #19

Are you sure you want to change the base?

Fix Issues with Microsetta-Processing Scripts for American Gut Project Data #19

Conversation

l1joseph commented Mar 10, 2024

lucaspatel commented Mar 11, 2024