Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding per-base inference #9

Merged
merged 53 commits into from
Feb 14, 2024
Merged

Adding per-base inference #9

merged 53 commits into from
Feb 14, 2024

Conversation

matsen
Copy link
Contributor

@matsen matsen commented Feb 12, 2024

  • New family of models: RSXX for Rate Substitution models. These generate csps as well, which are conditional substitution probabilities.
  • Generalizing everything to make that work

Note this that appears everywhere:

        # When we have an N, set all the CSP logits to 0, resulting in a uniform
        # prediction. There is nothing to predict here.
        csp_logits *= masks.unsqueeze(-1)
        # As described elsewhere, this makes the WT base have a probability of 0
        # after softmax.
        csp_logits += wt_base_modifier

@matsen
Copy link
Contributor Author

matsen commented Feb 12, 2024

Dropping this model shape:

            f"{prename}_cnn_med_orig": models.CNNModel(
                kmer_length=3,
                kernel_size=11,
                embedding_dim=9,
                filter_count=9,
                dropout_prob=0.1,
            ),

@matsen
Copy link
Contributor Author

matsen commented Feb 12, 2024

Here's a diagram of the RS CNNs:

rs-cnns

Copy link

@mmjohn mmjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm running into a few issues with unit tests on ermine in epam conda env.

First, I updated tests/test_framework.py by swapping create_mutation_and_base_indicators with encode_mut_pos_and_base (issues mentioned here). This test now passes.

Remaining errors:

FAILED tests/test_netam.py::test_write_output - RuntimeError: Parent directory _ignore does not exist.
FAILED tests/test_netam.py::test_crepe_roundtrip - RuntimeError: Parent directory _ignore does not exist.
ERROR tests/test_dnsm.py::test_crepe_roundtrip - FileNotFoundError: ``/Users/matsen/data/wyatt-10x-1p5m_pcp_2023-10-07.first100.shmple.hdf5`` does not ...

The first two can be solved by making the directory. It might be more useful to add a check for the directory and create if necessary as we do here in epam.

For the last issue, can this file live somewhere accessible?

@matsen matsen merged commit 4114994 into main Feb 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants