diff --git a/_posts/2024-05-30-counting.md b/_posts/2024-05-30-counting.md index b45aec8..68f76fd 100644 --- a/_posts/2024-05-30-counting.md +++ b/_posts/2024-05-30-counting.md @@ -50,8 +50,8 @@ This information helps disambiguate the different regions based on context. #### Key Propositions -1. **Proposition 1:** If the regional contextual position information is available in the latent representation of the tokens at some layer of a Transformer, the contextual counting task can be solved with a single additional layer. -2. **Proposition 2:** A causal Transformer with a single layer and no position encoding (NoPE) can infer the regional contextual position. +- **Proposition 1:** If the regional contextual position information is available in the latent representation of the tokens at some layer of a Transformer, the contextual counting task can be solved with a single additional layer. +- **Proposition 2:** A causal Transformer with a single layer and no position encoding (NoPE) can infer the regional contextual position. These propositions imply that a two-layer causal Transformer with NoPE can solve the contextual counting task.