Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to parallelize between GPUs #30

Merged
merged 31 commits into from
Jun 6, 2024
Merged

Ability to parallelize between GPUs #30

merged 31 commits into from
Jun 6, 2024

Conversation

matsen
Copy link
Contributor

@matsen matsen commented Jun 6, 2024

  • factoring apart load_and_add_shm...
  • finding least used gpu and using it, with a default
  • branch lengths are tensors
  • simplifying device handling of crepes

matsen added 23 commits June 5, 2024 07:55
The code changes modify the `pick_device` function in `common.py` to improve the CUDA device selection for a specific job. The print statement has been updated to include the job ID, providing more informative output.

Based on the recent user commits and repository commits, it seems that there have been updates related to GPU selection and dataset movement. However, these commits do not directly relate to the current code changes.

Please note that the commit message should not include any meta information such as issue references, tags, or author names.
@matsen
Copy link
Contributor Author

matsen commented Jun 6, 2024

Dropped:

    def model_and_optimizer_to(self, device):
        self.model.to(device)
        for state in self.optimizer.state.values():
            for k, v in state.items():
                if isinstance(v, torch.Tensor):
                    state[k] = v.to(device)
        for dataset in [self.train_dataset, self.val_dataset]:
            if dataset is not None:
                dataset.to(device)

@matsen
Copy link
Contributor Author

matsen commented Jun 6, 2024

Also deferred to a future issue:

def dataset_of_pcp_df(pcp_df, branch_length_multiplier=5.0):
    return DNSMDataset.from_data(
        pcp_df["parents"],
        pcp_df["children"],
        pcp_df["rates"],
        pcp_df["subs_probs"],
        branch_length_multiplier=branch_length_multiplier,
    )


def train_val_datasets_of_pcp_df(pcp_df, branch_length_multiplier=5.0):
    """
    Perform a train-val split based on a "in_train" column.
    """
    train_df = pcp_df[pcp_df["in_train"]].reset_index(drop=True)
    val_df = pcp_df[~pcp_df["in_train"]].reset_index(drop=True)

    val_dataset = dataset_of_pcp_df(
        val_df,
        branch_length_multiplier=branch_length_multiplier,
    )
    if len(train_df) == 0:
        return None, val_dataset
    # else:
    train_dataset = dataset_of_pcp_df(
        train_df,
        branch_length_multiplier=branch_length_multiplier,
    )
    return train_dataset, val_dataset

@matsen matsen marked this pull request as ready for review June 6, 2024 17:32
@matsen matsen merged commit c8d6ef4 into main Jun 6, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant