-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: setting up pyright type checking and fixing typing errors #18
Conversation
@@ -462,11 +465,7 @@ def get_all_html(self, debug: bool = False, split_scripts: bool = False) -> str: | |||
if debug: | |||
display(HTML(html_string)) | |||
|
|||
if split_scripts: | |||
scripts, html_string = extract_and_remove_scripts(html_string) | |||
return scripts, html_string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This split_scripts
option causes the return type to be a tuple instead of a string, which breaks typing. It looks like this option is never used anyway throughout the code so I just deleted it here. If this is not OK I can try going back and casting everything to str
everywhere this function is called, but IMO it's probably cleaner to have functions have only a single return type in general.
Looks like tests are failing due to the huggingface hub being down 🤦♂️. https://twitter.com/huggingface/status/1762954032312639702 |
No worries! Can wait. |
@chanind Amazing work! I think we're likely to delete a lot of the SAE visualizer stuff (sorry for not telling you earlier) as the new version of SAE vis (https://github.com/callummcdougall/sae_vis) should more/less work with our code. I'll make an issue for making a script to show how to use this with those SAEs. Also #17 might cause some further type checking issues. Idk if it make more sense to merge this PR or that PR first, do you have thoughts? Many thanks! |
No worries! Go ahead and merge #17 first and I'll fix up any typing issues in this PR. That's likely easier than you having to figure out the typing stuff just to get the PR merged. It shouldn't be too much to fix up. |
@chanind other PR has been merged if you want to rebase :) |
|
||
assert sparse_autoencoder.cfg.d_sae is not None # keep pyright happy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is d_sae
really optional? It seems like throughout the code it's assumed to be present
self.activation_store, | ||
) = LMSparseAutoencoderSessionloader.load_session_from_pretrained(self.sae_path) | ||
self.sparse_autoencoder = sae_group.autoencoders[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to pull out just the first autoencoder from the group here, since it looks like this file is expecting a single autoencoder to come out of this function rather than a group. IIRC this file will be removed in the future anyway, so probably fine?
self.activation_store, | ||
) = LMSparseAutoencoderSessionloader.load_session_from_pretrained(self.sae_path) | ||
# TODO: handle multiple autoencoders | ||
self.sparse_autoencoder = sae_group.autoencoders[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to pull out just the first autoencoder from the group here as well. Is this file also going to be removed in the future?
👍 should be updated for the new changes. |
thanks @chanind |
chore: setting up pyright type checking and fixing typing errors
This PR sets up Pyright for type-checking, and attempts to fix all type errors in the codebase.
Some things to note:
Any
in later parts of the code.Any
when I didn't know what the correct type of things wasTODO
notes where there was some weird typing thing I didn't know how to handle properly.geom_median
dir is ignored by the type checking, since that's sort of independent of this codebase.closes #15