Skip to content

Commit

Permalink
Actually doing merge
Browse files Browse the repository at this point in the history
  • Loading branch information
JoshEngels committed Jul 4, 2024
2 parents 52780c0 + 328e0de commit c362e81
Show file tree
Hide file tree
Showing 29 changed files with 908 additions and 203 deletions.
86 changes: 86 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,92 @@



## v3.11.0 (2024-07-04)

### Feature

* feat: make pretrained sae directory docs page (#213)

* make pretrained sae directory docs page

* type issue weirdness

* type issue weirdness ([`b8a99ab`](https://github.com/jbloomAus/SAELens/commit/b8a99ab4dfe8f7790a3b15f41e351fbc3b82f1ab))


## v3.10.0 (2024-07-04)

### Feature

* feat: make activations_store re start the dataset when it runs out (#207)

* make activations_store re start the dataset when it runs out

* remove misleading comments

* allow StopIteration to bubble up where appropriate

* add test to ensure that stopiteration is raised

* formatting

* more formatting

* format tweak so we can re-try ci

* add deps back ([`91f4850`](https://github.com/jbloomAus/SAELens/commit/91f48502c39cd573d5f28aba2f3295c7694112e6))

* feat: allow models to be passed in as overrides (#210) ([`dd95996`](https://github.com/jbloomAus/SAELens/commit/dd95996efaa46c779b85ead9e52a8342869cfc24))

### Fix

* fix: Activation store factor unscaling fold fix (#212)

* add unscaling to evals

* fix act norm unscaling missing

* improved variance explained, still off for that prompt

* format

* why suddenly a typingerror and only in CI? ([`1db84b5`](https://github.com/jbloomAus/SAELens/commit/1db84b5ca4ab82fae9edbe98c1e9a563ed1eb3c9))


## v3.9.2 (2024-07-03)

### Fix

* fix: Gated SAE Note Loading (#211)

* fix: add tests, make pass

* not in ([`b083feb`](https://github.com/jbloomAus/SAELens/commit/b083feb5ffb5b7f45669403786c3c7593aa1d3ba))

### Unknown

* SAETrainingRunner takes optional HFDataset (#206)

* SAETrainingRunner takes optional HFDataset

* more explicit errors when the buffer is too large for the dataset

* format

* add warnings when a new dataset is added

* replace default dataset with empty string

* remove valueerror ([`2c8fb6a`](https://github.com/jbloomAus/SAELens/commit/2c8fb6aeed214ff47dccfe427eb2881aca4e6808))


## v3.9.1 (2024-07-01)

### Fix

* fix: pin typing-extensions version (#205) ([`3f0e4fe`](https://github.com/jbloomAus/SAELens/commit/3f0e4fe9e1a353e8b9563567919734af662ab69d))


## v3.9.0 (2024-07-01)

### Feature
Expand Down
128 changes: 128 additions & 0 deletions docs/generate_sae_table.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# type: ignore
import json
from pathlib import Path

import pandas as pd
import yaml
from huggingface_hub import hf_hub_download
from tqdm import tqdm

from sae_lens import SAEConfig
from sae_lens.toolkit.pretrained_sae_loaders import (
get_sae_config_from_hf,
handle_config_defaulting,
)

INCLUDED_CFG = [
# "id",
# "architecture",
# "model_name",
"hook_name",
"hook_layer",
"d_sae",
"context_size",
"dataset_path",
"normalize_activations",
]


def on_pre_build(config):
print("Generating SAE table...")
generate_sae_table()
print("SAE table generation complete.")


def generate_sae_table():
# Read the YAML file
yaml_path = Path("sae_lens/pretrained_saes.yaml")
with open(yaml_path, "r") as file:
data = yaml.safe_load(file)

# Start the Markdown content
markdown_content = "# Pretrained SAEs\n\n"
markdown_content += "This is a list of SAEs importable from the SAELens package. Click on each link for more details.\n\n" # Added newline
markdown_content += "*This file contains the contents of `sae_lens/pretrained_saes.yaml` in Markdown*\n\n"

# Generate content for each model
for model_name, model_info in tqdm(data["SAE_LOOKUP"].items()):
repo_link = f"https://huggingface.co/{model_info['repo_id']}"
markdown_content += f"## [{model_name}]({repo_link})\n\n"
markdown_content += f"- **Huggingface Repo**: {model_info['repo_id']}\n"
markdown_content += f"- **model**: {model_info['model']}\n"

if "links" in model_info:
markdown_content += "- **Additional Links**:\n"
for link_type, url in model_info["links"].items():
markdown_content += f" - [{link_type.capitalize()}]({url})\n"

markdown_content += "\n"

# get the config

# for sae_info in model_info["saes"]:
# sae_cfg = get_sae_config_from_hf(
# model_info["repo_id"],
# sae_info["path"],
# )

for info in tqdm(model_info["saes"]):

# can remove this by explicitly overriding config in yaml. Do this later.
if model_info["conversion_func"] == "connor_rob_hook_z":
repo_id = model_info["repo_id"]
folder_name = info["path"]
config_path = folder_name.split(".pt")[0] + "_cfg.json"
config_path = hf_hub_download(repo_id, config_path)
old_cfg_dict = json.load(open(config_path, "r"))

cfg = {
"architecture": "standard",
"d_in": old_cfg_dict["act_size"],
"d_sae": old_cfg_dict["dict_size"],
"dtype": "float32",
"device": "cpu",
"model_name": "gpt2-small",
"hook_name": old_cfg_dict["act_name"],
"hook_layer": old_cfg_dict["layer"],
"hook_head_index": None,
"activation_fn_str": "relu",
"apply_b_dec_to_input": True,
"finetuning_scaling_factor": False,
"sae_lens_training_version": None,
"prepend_bos": True,
"dataset_path": "Skylion007/openwebtext",
"context_size": 128,
"normalize_activations": "none",
"dataset_trust_remote_code": True,
}
cfg = handle_config_defaulting(cfg)
cfg = SAEConfig.from_dict(cfg).to_dict()
info.update(cfg)
else:
cfg = get_sae_config_from_hf(
model_info["repo_id"],
info["path"],
)
cfg = handle_config_defaulting(cfg)
cfg = SAEConfig.from_dict(cfg).to_dict()
info.update(cfg)

# cfg_to_in
# Create DataFrame for SAEs
df = pd.DataFrame(model_info["saes"])

# Keep only 'id' and 'path' columns
df = df[INCLUDED_CFG]

# Generate Markdown table
table = df.to_markdown(index=False)
markdown_content += table + "\n\n"

# Write the content to a Markdown file
output_path = Path("docs/sae_table.md")
with open(output_path, "w") as file:
file.write(markdown_content)


if __name__ == "__main__":
generate_sae_table()
173 changes: 173 additions & 0 deletions docs/sae_table.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
# Pretrained SAEs

This is a list of SAEs importable from the SAELens package. Click on each link for more details.

*This file contains the contents of `sae_lens/pretrained_saes.yaml` in Markdown*

## [gpt2-small-res-jb](https://huggingface.co/jbloom/GPT2-Small-SAEs-Reformatted)

- **Huggingface Repo**: jbloom/GPT2-Small-SAEs-Reformatted
- **model**: gpt2-small
- **Additional Links**:
- [Model](https://huggingface.co/gpt2)
- [Dashboards](https://www.neuronpedia.org/gpt2sm-res-jb)
- [Publication](https://www.lesswrong.com/posts/f9EgfLSurAiqRJySD/open-source-sparse-autoencoders-for-all-residual-stream)

| hook_name | hook_layer | d_sae | context_size | normalize_activations |
|:--------------------------|-------------:|--------:|---------------:|:------------------------|
| blocks.0.hook_resid_pre | 0 | 24576 | 128 | none |
| blocks.1.hook_resid_pre | 1 | 24576 | 128 | none |
| blocks.2.hook_resid_pre | 2 | 24576 | 128 | none |
| blocks.3.hook_resid_pre | 3 | 24576 | 128 | none |
| blocks.4.hook_resid_pre | 4 | 24576 | 128 | none |
| blocks.5.hook_resid_pre | 5 | 24576 | 128 | none |
| blocks.6.hook_resid_pre | 6 | 24576 | 128 | none |
| blocks.7.hook_resid_pre | 7 | 24576 | 128 | none |
| blocks.8.hook_resid_pre | 8 | 24576 | 128 | none |
| blocks.9.hook_resid_pre | 9 | 24576 | 128 | none |
| blocks.10.hook_resid_pre | 10 | 24576 | 128 | none |
| blocks.11.hook_resid_pre | 11 | 24576 | 128 | none |
| blocks.11.hook_resid_post | 11 | 24576 | 128 | none |

## [gpt2-small-hook-z-kk](https://huggingface.co/ckkissane/attn-saes-gpt2-small-all-layers)

- **Huggingface Repo**: ckkissane/attn-saes-gpt2-small-all-layers
- **model**: gpt2-small
- **Additional Links**:
- [Model](https://huggingface.co/gpt2)
- [Dashboards](https://www.neuronpedia.org/gpt2sm-kk)
- [Publication](https://www.lesswrong.com/posts/FSTRedtjuHa4Gfdbr/attention-saes-scale-to-gpt-2-small)

| hook_name | hook_layer | d_sae | context_size | normalize_activations |
|:----------------------|-------------:|--------:|---------------:|:------------------------|
| blocks.0.attn.hook_z | 0 | 24576 | 128 | none |
| blocks.1.attn.hook_z | 1 | 24576 | 128 | none |
| blocks.2.attn.hook_z | 2 | 24576 | 128 | none |
| blocks.3.attn.hook_z | 3 | 24576 | 128 | none |
| blocks.4.attn.hook_z | 4 | 24576 | 128 | none |
| blocks.5.attn.hook_z | 5 | 49152 | 128 | none |
| blocks.6.attn.hook_z | 6 | 24576 | 128 | none |
| blocks.7.attn.hook_z | 7 | 49152 | 128 | none |
| blocks.8.attn.hook_z | 8 | 24576 | 128 | none |
| blocks.9.attn.hook_z | 9 | 24576 | 128 | none |
| blocks.10.attn.hook_z | 10 | 24576 | 128 | none |
| blocks.11.attn.hook_z | 11 | 24576 | 128 | none |

## [gpt2-small-mlp-tm](https://huggingface.co/tommmcgrath/gpt2-small-mlp-out-saes)

- **Huggingface Repo**: tommmcgrath/gpt2-small-mlp-out-saes
- **model**: gpt2-small
- **Additional Links**:
- [Model](https://huggingface.co/gpt2)

| hook_name | hook_layer | d_sae | context_size | normalize_activations |
|:-----------------------|-------------:|--------:|---------------:|:-------------------------|
| blocks.0.hook_mlp_out | 0 | 24576 | 512 | expected_average_only_in |
| blocks.1.hook_mlp_out | 1 | 24576 | 512 | expected_average_only_in |
| blocks.2.hook_mlp_out | 2 | 24576 | 512 | expected_average_only_in |
| blocks.3.hook_mlp_out | 3 | 24576 | 512 | expected_average_only_in |
| blocks.4.hook_mlp_out | 4 | 24576 | 512 | expected_average_only_in |
| blocks.5.hook_mlp_out | 5 | 24576 | 512 | expected_average_only_in |
| blocks.6.hook_mlp_out | 6 | 24576 | 512 | expected_average_only_in |
| blocks.7.hook_mlp_out | 7 | 24576 | 512 | expected_average_only_in |
| blocks.8.hook_mlp_out | 8 | 24576 | 512 | expected_average_only_in |
| blocks.9.hook_mlp_out | 9 | 24576 | 512 | expected_average_only_in |
| blocks.10.hook_mlp_out | 10 | 24576 | 512 | expected_average_only_in |
| blocks.11.hook_mlp_out | 11 | 24576 | 512 | expected_average_only_in |

## [gpt2-small-res-jb-feature-splitting](https://huggingface.co/jbloom/GPT2-Small-Feature-Splitting-Experiment-Layer-8)

- **Huggingface Repo**: jbloom/GPT2-Small-Feature-Splitting-Experiment-Layer-8
- **model**: gpt2-small
- **Additional Links**:
- [Model](https://huggingface.co/gpt2)
- [Dashboards](https://www.neuronpedia.org/gpt2sm-rfs-jb)

| hook_name | hook_layer | d_sae | context_size | normalize_activations |
|:------------------------|-------------:|--------:|---------------:|:------------------------|
| blocks.8.hook_resid_pre | 8 | 768 | 128 | none |
| blocks.8.hook_resid_pre | 8 | 1536 | 128 | none |
| blocks.8.hook_resid_pre | 8 | 3072 | 128 | none |
| blocks.8.hook_resid_pre | 8 | 6144 | 128 | none |
| blocks.8.hook_resid_pre | 8 | 12288 | 128 | none |
| blocks.8.hook_resid_pre | 8 | 24576 | 128 | none |
| blocks.8.hook_resid_pre | 8 | 49152 | 128 | none |
| blocks.8.hook_resid_pre | 8 | 98304 | 128 | none |

## [gpt2-small-resid-post-v5-32k](https://huggingface.co/jbloom/GPT2-Small-OAI-v5-32k-resid-post-SAEs)

- **Huggingface Repo**: jbloom/GPT2-Small-OAI-v5-32k-resid-post-SAEs
- **model**: gpt2-small

| hook_name | hook_layer | d_sae | context_size | normalize_activations |
|:--------------------------|-------------:|--------:|---------------:|:------------------------|
| blocks.0.hook_resid_post | 0 | 32768 | 64 | layer_norm |
| blocks.1.hook_resid_post | 1 | 32768 | 64 | layer_norm |
| blocks.2.hook_resid_post | 2 | 32768 | 64 | layer_norm |
| blocks.3.hook_resid_post | 3 | 32768 | 64 | layer_norm |
| blocks.4.hook_resid_post | 4 | 32768 | 64 | layer_norm |
| blocks.5.hook_resid_post | 5 | 32768 | 64 | layer_norm |
| blocks.6.hook_resid_post | 6 | 32768 | 64 | layer_norm |
| blocks.7.hook_resid_post | 7 | 32768 | 64 | layer_norm |
| blocks.8.hook_resid_post | 8 | 32768 | 64 | layer_norm |
| blocks.9.hook_resid_post | 9 | 32768 | 64 | layer_norm |
| blocks.10.hook_resid_post | 10 | 32768 | 64 | layer_norm |
| blocks.11.hook_resid_post | 11 | 32768 | 64 | layer_norm |

## [gpt2-small-resid-post-v5-128k](https://huggingface.co/jbloom/GPT2-Small-OAI-v5-128k-resid-post-SAEs)

- **Huggingface Repo**: jbloom/GPT2-Small-OAI-v5-128k-resid-post-SAEs
- **model**: gpt2-small

| hook_name | hook_layer | d_sae | context_size | normalize_activations |
|:--------------------------|-------------:|--------:|---------------:|:------------------------|
| blocks.0.hook_resid_post | 0 | 131072 | 64 | layer_norm |
| blocks.1.hook_resid_post | 1 | 131072 | 64 | layer_norm |
| blocks.2.hook_resid_post | 2 | 131072 | 64 | layer_norm |
| blocks.3.hook_resid_post | 3 | 131072 | 64 | layer_norm |
| blocks.4.hook_resid_post | 4 | 131072 | 64 | layer_norm |
| blocks.5.hook_resid_post | 5 | 131072 | 64 | layer_norm |
| blocks.6.hook_resid_post | 6 | 131072 | 64 | layer_norm |
| blocks.7.hook_resid_post | 7 | 131072 | 64 | layer_norm |
| blocks.8.hook_resid_post | 8 | 131072 | 64 | layer_norm |
| blocks.9.hook_resid_post | 9 | 131072 | 64 | layer_norm |
| blocks.10.hook_resid_post | 10 | 131072 | 64 | layer_norm |
| blocks.11.hook_resid_post | 11 | 131072 | 64 | layer_norm |

## [gemma-2b-res-jb](https://huggingface.co/jbloom/Gemma-2b-Residual-Stream-SAEs)

- **Huggingface Repo**: jbloom/Gemma-2b-Residual-Stream-SAEs
- **model**: gemma-2b
- **Additional Links**:
- [Model](https://huggingface.co/google/gemma-2b)
- [Dashboards](https://www.neuronpedia.org/gemma2b-res-jb)

| hook_name | hook_layer | d_sae | context_size | normalize_activations |
|:--------------------------|-------------:|--------:|---------------:|:-------------------------|
| blocks.0.hook_resid_post | 0 | 16384 | 1024 | none |
| blocks.6.hook_resid_post | 6 | 16384 | 1024 | none |
| blocks.12.hook_resid_post | 12 | 16384 | 1024 | expected_average_only_in |

## [gemma-2b-it-res-jb](https://huggingface.co/jbloom/Gemma-2b-IT-Residual-Stream-SAEs)

- **Huggingface Repo**: jbloom/Gemma-2b-IT-Residual-Stream-SAEs
- **model**: gemma-2b-it
- **Additional Links**:
- [Model](https://huggingface.co/google/gemma-2b-it)
- [Dashboards](https://www.neuronpedia.org/gemma2bit-res-jb)

| hook_name | hook_layer | d_sae | context_size | normalize_activations |
|:--------------------------|-------------:|--------:|---------------:|:------------------------|
| blocks.12.hook_resid_post | 12 | 16384 | 1024 | none |

## [mistral-7b-res-wg](https://huggingface.co/JoshEngels/Mistral-7B-Residual-Stream-SAEs)

- **Huggingface Repo**: JoshEngels/Mistral-7B-Residual-Stream-SAEs
- **model**: mistral-7b

| hook_name | hook_layer | d_sae | context_size | normalize_activations |
|:-------------------------|-------------:|--------:|---------------:|:------------------------|
| blocks.8.hook_resid_pre | 8 | 65536 | 256 | none |
| blocks.16.hook_resid_pre | 16 | 65536 | 256 | none |
| blocks.24.hook_resid_pre | 24 | 65536 | 256 | none |

Loading

0 comments on commit c362e81

Please sign in to comment.