Actually doing merge

jbloomAus · Jul 4, 2024 · c362e81 · c362e81
2 parents 52780c0 + 328e0de
commit c362e81
Show file tree

Hide file tree

Showing 29 changed files with 908 additions and 203 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,6 +2,92 @@
 
 
 
+## v3.11.0 (2024-07-04)
+
+### Feature
+
+* feat: make pretrained sae directory docs page (#213)
+
+* make pretrained sae directory docs page
+
+* type issue weirdness
+
+* type issue weirdness ([`b8a99ab`](https://github.com/jbloomAus/SAELens/commit/b8a99ab4dfe8f7790a3b15f41e351fbc3b82f1ab))
+
+
+## v3.10.0 (2024-07-04)
+
+### Feature
+
+* feat: make activations_store re start the dataset when it runs out (#207)
+
+* make activations_store re start the dataset when it runs out
+
+* remove misleading comments
+
+* allow StopIteration to bubble up where appropriate
+
+* add test to ensure that stopiteration is raised
+
+* formatting
+
+* more formatting
+
+* format tweak so we can re-try ci
+
+* add deps back ([`91f4850`](https://github.com/jbloomAus/SAELens/commit/91f48502c39cd573d5f28aba2f3295c7694112e6))
+
+* feat: allow models to be passed in as overrides (#210) ([`dd95996`](https://github.com/jbloomAus/SAELens/commit/dd95996efaa46c779b85ead9e52a8342869cfc24))
+
+### Fix
+
+* fix: Activation store factor unscaling fold fix (#212)
+
+* add unscaling to evals
+
+* fix act norm unscaling missing
+
+* improved variance explained, still off for that prompt
+
+* format
+
+* why suddenly a typingerror and only in CI? ([`1db84b5`](https://github.com/jbloomAus/SAELens/commit/1db84b5ca4ab82fae9edbe98c1e9a563ed1eb3c9))
+
+
+## v3.9.2 (2024-07-03)
+
+### Fix
+
+* fix: Gated SAE Note Loading (#211)
+
+* fix: add tests, make pass
+
+* not in ([`b083feb`](https://github.com/jbloomAus/SAELens/commit/b083feb5ffb5b7f45669403786c3c7593aa1d3ba))
+
+### Unknown
+
+* SAETrainingRunner takes optional HFDataset (#206)
+
+* SAETrainingRunner takes optional HFDataset
+
+* more explicit errors when the buffer is too large for the dataset
+
+* format
+
+* add warnings when a new dataset is added
+
+* replace default dataset with empty string
+
+* remove valueerror ([`2c8fb6a`](https://github.com/jbloomAus/SAELens/commit/2c8fb6aeed214ff47dccfe427eb2881aca4e6808))
+
+
+## v3.9.1 (2024-07-01)
+
+### Fix
+
+* fix: pin typing-extensions version (#205) ([`3f0e4fe`](https://github.com/jbloomAus/SAELens/commit/3f0e4fe9e1a353e8b9563567919734af662ab69d))
+
+
 ## v3.9.0 (2024-07-01)
 
 ### Feature

diff --git a/docs/generate_sae_table.py b/docs/generate_sae_table.py
@@ -0,0 +1,128 @@
+# type: ignore
+import json
+from pathlib import Path
+
+import pandas as pd
+import yaml
+from huggingface_hub import hf_hub_download
+from tqdm import tqdm
+
+from sae_lens import SAEConfig
+from sae_lens.toolkit.pretrained_sae_loaders import (
+    get_sae_config_from_hf,
+    handle_config_defaulting,
+)
+
+INCLUDED_CFG = [
+    # "id",
+    # "architecture",
+    # "model_name",
+    "hook_name",
+    "hook_layer",
+    "d_sae",
+    "context_size",
+    "dataset_path",
+    "normalize_activations",
+]
+
+
+def on_pre_build(config):
+    print("Generating SAE table...")
+    generate_sae_table()
+    print("SAE table generation complete.")
+
+
+def generate_sae_table():
+    # Read the YAML file
+    yaml_path = Path("sae_lens/pretrained_saes.yaml")
+    with open(yaml_path, "r") as file:
+        data = yaml.safe_load(file)
+
+    # Start the Markdown content
+    markdown_content = "# Pretrained SAEs\n\n"
+    markdown_content += "This is a list of SAEs importable from the SAELens package. Click on each link for more details.\n\n"  # Added newline
+    markdown_content += "*This file contains the contents of `sae_lens/pretrained_saes.yaml` in Markdown*\n\n"
+
+    # Generate content for each model
+    for model_name, model_info in tqdm(data["SAE_LOOKUP"].items()):
+        repo_link = f"https://huggingface.co/{model_info['repo_id']}"
+        markdown_content += f"## [{model_name}]({repo_link})\n\n"
+        markdown_content += f"- **Huggingface Repo**: {model_info['repo_id']}\n"
+        markdown_content += f"- **model**: {model_info['model']}\n"
+
+        if "links" in model_info:
+            markdown_content += "- **Additional Links**:\n"
+            for link_type, url in model_info["links"].items():
+                markdown_content += f"    - [{link_type.capitalize()}]({url})\n"
+
+        markdown_content += "\n"
+
+        # get the config
+
+        # for sae_info in model_info["saes"]:
+        #     sae_cfg = get_sae_config_from_hf(
+        #         model_info["repo_id"],
+        #         sae_info["path"],
+        #     )
+
+        for info in tqdm(model_info["saes"]):
+
+            # can remove this by explicitly overriding config in yaml. Do this later.
+            if model_info["conversion_func"] == "connor_rob_hook_z":
+                repo_id = model_info["repo_id"]
+                folder_name = info["path"]
+                config_path = folder_name.split(".pt")[0] + "_cfg.json"
+                config_path = hf_hub_download(repo_id, config_path)
+                old_cfg_dict = json.load(open(config_path, "r"))
+
+                cfg = {
+                    "architecture": "standard",
+                    "d_in": old_cfg_dict["act_size"],
+                    "d_sae": old_cfg_dict["dict_size"],
+                    "dtype": "float32",
+                    "device": "cpu",
+                    "model_name": "gpt2-small",
+                    "hook_name": old_cfg_dict["act_name"],
+                    "hook_layer": old_cfg_dict["layer"],
+                    "hook_head_index": None,
+                    "activation_fn_str": "relu",
+                    "apply_b_dec_to_input": True,
+                    "finetuning_scaling_factor": False,
+                    "sae_lens_training_version": None,
+                    "prepend_bos": True,
+                    "dataset_path": "Skylion007/openwebtext",
+                    "context_size": 128,
+                    "normalize_activations": "none",
+                    "dataset_trust_remote_code": True,
+                }
+                cfg = handle_config_defaulting(cfg)
+                cfg = SAEConfig.from_dict(cfg).to_dict()
+                info.update(cfg)
+            else:
+                cfg = get_sae_config_from_hf(
+                    model_info["repo_id"],
+                    info["path"],
+                )
+                cfg = handle_config_defaulting(cfg)
+                cfg = SAEConfig.from_dict(cfg).to_dict()
+            info.update(cfg)
+
+        # cfg_to_in
+        # Create DataFrame for SAEs
+        df = pd.DataFrame(model_info["saes"])
+
+        # Keep only 'id' and 'path' columns
+        df = df[INCLUDED_CFG]
+
+        # Generate Markdown table
+        table = df.to_markdown(index=False)
+        markdown_content += table + "\n\n"
+
+    # Write the content to a Markdown file
+    output_path = Path("docs/sae_table.md")
+    with open(output_path, "w") as file:
+        file.write(markdown_content)
+
+
+if __name__ == "__main__":
+    generate_sae_table()
diff --git a/docs/sae_table.md b/docs/sae_table.md
@@ -0,0 +1,173 @@
+# Pretrained SAEs
+
+This is a list of SAEs importable from the SAELens package. Click on each link for more details.
+
+*This file contains the contents of `sae_lens/pretrained_saes.yaml` in Markdown*
+
+## [gpt2-small-res-jb](https://huggingface.co/jbloom/GPT2-Small-SAEs-Reformatted)
+
+- **Huggingface Repo**: jbloom/GPT2-Small-SAEs-Reformatted
+- **model**: gpt2-small
+- **Additional Links**:
+    - [Model](https://huggingface.co/gpt2)
+    - [Dashboards](https://www.neuronpedia.org/gpt2sm-res-jb)
+    - [Publication](https://www.lesswrong.com/posts/f9EgfLSurAiqRJySD/open-source-sparse-autoencoders-for-all-residual-stream)
+
+| hook_name                 |   hook_layer |   d_sae |   context_size | normalize_activations   |
+|:--------------------------|-------------:|--------:|---------------:|:------------------------|
+| blocks.0.hook_resid_pre   |            0 |   24576 |            128 | none                    |
+| blocks.1.hook_resid_pre   |            1 |   24576 |            128 | none                    |
+| blocks.2.hook_resid_pre   |            2 |   24576 |            128 | none                    |
+| blocks.3.hook_resid_pre   |            3 |   24576 |            128 | none                    |
+| blocks.4.hook_resid_pre   |            4 |   24576 |            128 | none                    |
+| blocks.5.hook_resid_pre   |            5 |   24576 |            128 | none                    |
+| blocks.6.hook_resid_pre   |            6 |   24576 |            128 | none                    |
+| blocks.7.hook_resid_pre   |            7 |   24576 |            128 | none                    |
+| blocks.8.hook_resid_pre   |            8 |   24576 |            128 | none                    |
+| blocks.9.hook_resid_pre   |            9 |   24576 |            128 | none                    |
+| blocks.10.hook_resid_pre  |           10 |   24576 |            128 | none                    |
+| blocks.11.hook_resid_pre  |           11 |   24576 |            128 | none                    |
+| blocks.11.hook_resid_post |           11 |   24576 |            128 | none                    |
+
+## [gpt2-small-hook-z-kk](https://huggingface.co/ckkissane/attn-saes-gpt2-small-all-layers)
+
+- **Huggingface Repo**: ckkissane/attn-saes-gpt2-small-all-layers
+- **model**: gpt2-small
+- **Additional Links**:
+    - [Model](https://huggingface.co/gpt2)
+    - [Dashboards](https://www.neuronpedia.org/gpt2sm-kk)
+    - [Publication](https://www.lesswrong.com/posts/FSTRedtjuHa4Gfdbr/attention-saes-scale-to-gpt-2-small)
+
+| hook_name             |   hook_layer |   d_sae |   context_size | normalize_activations   |
+|:----------------------|-------------:|--------:|---------------:|:------------------------|
+| blocks.0.attn.hook_z  |            0 |   24576 |            128 | none                    |
+| blocks.1.attn.hook_z  |            1 |   24576 |            128 | none                    |
+| blocks.2.attn.hook_z  |            2 |   24576 |            128 | none                    |
+| blocks.3.attn.hook_z  |            3 |   24576 |            128 | none                    |
+| blocks.4.attn.hook_z  |            4 |   24576 |            128 | none                    |
+| blocks.5.attn.hook_z  |            5 |   49152 |            128 | none                    |
+| blocks.6.attn.hook_z  |            6 |   24576 |            128 | none                    |
+| blocks.7.attn.hook_z  |            7 |   49152 |            128 | none                    |
+| blocks.8.attn.hook_z  |            8 |   24576 |            128 | none                    |
+| blocks.9.attn.hook_z  |            9 |   24576 |            128 | none                    |
+| blocks.10.attn.hook_z |           10 |   24576 |            128 | none                    |
+| blocks.11.attn.hook_z |           11 |   24576 |            128 | none                    |
+
+## [gpt2-small-mlp-tm](https://huggingface.co/tommmcgrath/gpt2-small-mlp-out-saes)
+
+- **Huggingface Repo**: tommmcgrath/gpt2-small-mlp-out-saes
+- **model**: gpt2-small
+- **Additional Links**:
+    - [Model](https://huggingface.co/gpt2)
+
+| hook_name              |   hook_layer |   d_sae |   context_size | normalize_activations    |
+|:-----------------------|-------------:|--------:|---------------:|:-------------------------|
+| blocks.0.hook_mlp_out  |            0 |   24576 |            512 | expected_average_only_in |
+| blocks.1.hook_mlp_out  |            1 |   24576 |            512 | expected_average_only_in |
+| blocks.2.hook_mlp_out  |            2 |   24576 |            512 | expected_average_only_in |
+| blocks.3.hook_mlp_out  |            3 |   24576 |            512 | expected_average_only_in |
+| blocks.4.hook_mlp_out  |            4 |   24576 |            512 | expected_average_only_in |
+| blocks.5.hook_mlp_out  |            5 |   24576 |            512 | expected_average_only_in |
+| blocks.6.hook_mlp_out  |            6 |   24576 |            512 | expected_average_only_in |
+| blocks.7.hook_mlp_out  |            7 |   24576 |            512 | expected_average_only_in |
+| blocks.8.hook_mlp_out  |            8 |   24576 |            512 | expected_average_only_in |
+| blocks.9.hook_mlp_out  |            9 |   24576 |            512 | expected_average_only_in |
+| blocks.10.hook_mlp_out |           10 |   24576 |            512 | expected_average_only_in |
+| blocks.11.hook_mlp_out |           11 |   24576 |            512 | expected_average_only_in |
+
+## [gpt2-small-res-jb-feature-splitting](https://huggingface.co/jbloom/GPT2-Small-Feature-Splitting-Experiment-Layer-8)
+
+- **Huggingface Repo**: jbloom/GPT2-Small-Feature-Splitting-Experiment-Layer-8
+- **model**: gpt2-small
+- **Additional Links**:
+    - [Model](https://huggingface.co/gpt2)
+    - [Dashboards](https://www.neuronpedia.org/gpt2sm-rfs-jb)
+
+| hook_name               |   hook_layer |   d_sae |   context_size | normalize_activations   |
+|:------------------------|-------------:|--------:|---------------:|:------------------------|
+| blocks.8.hook_resid_pre |            8 |     768 |            128 | none                    |
+| blocks.8.hook_resid_pre |            8 |    1536 |            128 | none                    |
+| blocks.8.hook_resid_pre |            8 |    3072 |            128 | none                    |
+| blocks.8.hook_resid_pre |            8 |    6144 |            128 | none                    |
+| blocks.8.hook_resid_pre |            8 |   12288 |            128 | none                    |
+| blocks.8.hook_resid_pre |            8 |   24576 |            128 | none                    |
+| blocks.8.hook_resid_pre |            8 |   49152 |            128 | none                    |
+| blocks.8.hook_resid_pre |            8 |   98304 |            128 | none                    |
+
+## [gpt2-small-resid-post-v5-32k](https://huggingface.co/jbloom/GPT2-Small-OAI-v5-32k-resid-post-SAEs)
+
+- **Huggingface Repo**: jbloom/GPT2-Small-OAI-v5-32k-resid-post-SAEs
+- **model**: gpt2-small
+
+| hook_name                 |   hook_layer |   d_sae |   context_size | normalize_activations   |
+|:--------------------------|-------------:|--------:|---------------:|:------------------------|
+| blocks.0.hook_resid_post  |            0 |   32768 |             64 | layer_norm              |
+| blocks.1.hook_resid_post  |            1 |   32768 |             64 | layer_norm              |
+| blocks.2.hook_resid_post  |            2 |   32768 |             64 | layer_norm              |
+| blocks.3.hook_resid_post  |            3 |   32768 |             64 | layer_norm              |
+| blocks.4.hook_resid_post  |            4 |   32768 |             64 | layer_norm              |
+| blocks.5.hook_resid_post  |            5 |   32768 |             64 | layer_norm              |
+| blocks.6.hook_resid_post  |            6 |   32768 |             64 | layer_norm              |
+| blocks.7.hook_resid_post  |            7 |   32768 |             64 | layer_norm              |
+| blocks.8.hook_resid_post  |            8 |   32768 |             64 | layer_norm              |
+| blocks.9.hook_resid_post  |            9 |   32768 |             64 | layer_norm              |
+| blocks.10.hook_resid_post |           10 |   32768 |             64 | layer_norm              |
+| blocks.11.hook_resid_post |           11 |   32768 |             64 | layer_norm              |
+
+## [gpt2-small-resid-post-v5-128k](https://huggingface.co/jbloom/GPT2-Small-OAI-v5-128k-resid-post-SAEs)
+
+- **Huggingface Repo**: jbloom/GPT2-Small-OAI-v5-128k-resid-post-SAEs
+- **model**: gpt2-small
+
+| hook_name                 |   hook_layer |   d_sae |   context_size | normalize_activations   |
+|:--------------------------|-------------:|--------:|---------------:|:------------------------|
+| blocks.0.hook_resid_post  |            0 |  131072 |             64 | layer_norm              |
+| blocks.1.hook_resid_post  |            1 |  131072 |             64 | layer_norm              |
+| blocks.2.hook_resid_post  |            2 |  131072 |             64 | layer_norm              |
+| blocks.3.hook_resid_post  |            3 |  131072 |             64 | layer_norm              |
+| blocks.4.hook_resid_post  |            4 |  131072 |             64 | layer_norm              |
+| blocks.5.hook_resid_post  |            5 |  131072 |             64 | layer_norm              |
+| blocks.6.hook_resid_post  |            6 |  131072 |             64 | layer_norm              |
+| blocks.7.hook_resid_post  |            7 |  131072 |             64 | layer_norm              |
+| blocks.8.hook_resid_post  |            8 |  131072 |             64 | layer_norm              |
+| blocks.9.hook_resid_post  |            9 |  131072 |             64 | layer_norm              |
+| blocks.10.hook_resid_post |           10 |  131072 |             64 | layer_norm              |
+| blocks.11.hook_resid_post |           11 |  131072 |             64 | layer_norm              |
+
+## [gemma-2b-res-jb](https://huggingface.co/jbloom/Gemma-2b-Residual-Stream-SAEs)
+
+- **Huggingface Repo**: jbloom/Gemma-2b-Residual-Stream-SAEs
+- **model**: gemma-2b
+- **Additional Links**:
+    - [Model](https://huggingface.co/google/gemma-2b)
+    - [Dashboards](https://www.neuronpedia.org/gemma2b-res-jb)
+
+| hook_name                 |   hook_layer |   d_sae |   context_size | normalize_activations    |
+|:--------------------------|-------------:|--------:|---------------:|:-------------------------|
+| blocks.0.hook_resid_post  |            0 |   16384 |           1024 | none                     |
+| blocks.6.hook_resid_post  |            6 |   16384 |           1024 | none                     |
+| blocks.12.hook_resid_post |           12 |   16384 |           1024 | expected_average_only_in |
+
+## [gemma-2b-it-res-jb](https://huggingface.co/jbloom/Gemma-2b-IT-Residual-Stream-SAEs)
+
+- **Huggingface Repo**: jbloom/Gemma-2b-IT-Residual-Stream-SAEs
+- **model**: gemma-2b-it
+- **Additional Links**:
+    - [Model](https://huggingface.co/google/gemma-2b-it)
+    - [Dashboards](https://www.neuronpedia.org/gemma2bit-res-jb)
+
+| hook_name                 |   hook_layer |   d_sae |   context_size | normalize_activations   |
+|:--------------------------|-------------:|--------:|---------------:|:------------------------|
+| blocks.12.hook_resid_post |           12 |   16384 |           1024 | none                    |
+
+## [mistral-7b-res-wg](https://huggingface.co/JoshEngels/Mistral-7B-Residual-Stream-SAEs)
+
+- **Huggingface Repo**: JoshEngels/Mistral-7B-Residual-Stream-SAEs
+- **model**: mistral-7b
+
+| hook_name                |   hook_layer |   d_sae |   context_size | normalize_activations   |
+|:-------------------------|-------------:|--------:|---------------:|:------------------------|
+| blocks.8.hook_resid_pre  |            8 |   65536 |            256 | none                    |
+| blocks.16.hook_resid_pre |           16 |   65536 |            256 | none                    |
+| blocks.24.hook_resid_pre |           24 |   65536 |            256 | none                    |
+