Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

masking from child sequences; DNSM model bugfix; Yun branch lengths #14

Merged
merged 20 commits into from
Apr 23, 2024

Conversation

matsen
Copy link
Contributor

@matsen matsen commented Mar 5, 2024

  • take the mask from the child sequence
  • fix bug in DNSM models which was interpreting "N" as an ambiguous AA
  • implementing Yun-style branch lengths
  • eliminating the pre-training before branch length optimization

@matsen
Copy link
Contributor Author

matsen commented Mar 5, 2024

Yun's derivation:

image

@matsen matsen requested review from ksung25 and mmjohn April 18, 2024 12:03
Copy link

@mmjohn mmjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

Running make test in my epam conda env gives the following errors on both ermine and local installs:

===================================== short test summary info ======================================
ERROR tests/test_dnsm.py::test_crepe_roundtrip - NameError: name 'training_method' is not defined
ERROR tests/test_netam.py::test_write_output - NameError: name 'training_method' is not defined
ERROR tests/test_netam.py::test_standardize_model_rates - NameError: name 'training_method' is not defined

@matsen
Copy link
Contributor Author

matsen commented Apr 19, 2024

Thanks for catching the test failure!! That was a spur of the moment last decision. Fixed now.

Copy link

@mmjohn mmjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything's working for me now

@matsen
Copy link
Contributor Author

matsen commented Apr 23, 2024

I decided to eliminate the short training before the first round of branch length optimization, which appeared here:

diff --git a/netam/framework.py b/netam/framework.py
index 42b348c..ec13b9d 100644
--- a/netam/framework.py
+++ b/netam/framework.py
@@ -656,9 +656,6 @@ class Burrito(ABC):
         else:
             raise ValueError(f"Unknown training method {training_method}")
         loss_history_l = []
-        self.mark_branch_lengths_optimized(0)
-        loss_history_l.append(self.train(3))
         optimize_branch_lengths()
         self.mark_branch_lengths_optimized(0)
         for cycle in range(cycle_count):

It was making the non-fixed SHM models harder to train, and actually an untrained model isn't so crazy scale-wise for the DNSM:

image

(Erick, this is here)

@matsen matsen merged commit 8f0cbd9 into main Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants