generated from jobindjohn/obsidian-publish-mkdocs
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* PUSH NOTE : PyTorch Conference 2024 - Fast Sparse Vision Transformers with minimal accuracy loss.md * PUSH ATTACHMENT : Pasted image 20240928133556.png * PUSH ATTACHMENT : Pasted image 20240928133618.png * PUSH ATTACHMENT : Pasted image 20240928133712.png * PUSH ATTACHMENT : Pasted image 20241001102234.png * PUSH NOTE : Reinforcement Learning - An Introduction - Chapter 7.md * PUSH NOTE : Reinforcement Learning - An Introduction - Chapter 5.md * PUSH ATTACHMENT : Pasted image 20240929183258.png * PUSH NOTE : Reinforcement Learning - An Introduction - Chapter 3.md * PUSH ATTACHMENT : Pasted image 20240928215826.png * PUSH NOTE : Reinforcement Learning - An Introduction - Chapter 11.md * PUSH NOTE : PyTorch internals.md * PUSH ATTACHMENT : Pasted image 20240928131008.png * PUSH NOTE : Reinforcement Learning - An Introduction.md * PUSH NOTE : Reinforcement Learning - An Introduction - Chapter 9.md * PUSH NOTE : Reinforcement Learning - An Introduction - Chapter 6.md
- Loading branch information
Showing
16 changed files
with
355 additions
and
2 deletions.
There are no files selected for viewing
22 changes: 22 additions & 0 deletions
22
...Conference 2024 - Fast Sparse Vision Transformers with minimal accuracy loss.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
--- | ||
authors: | ||
- "[[Jesse Cai|Jesse Cai]]" | ||
year: 2024 | ||
tags: | ||
- presentation | ||
url: https://static.sched.com/hosted_files/pytorch2024/c6/Sparsifying%20ViT%20lightning%20talk%20slides.pdf?_gl=1*19zah9b*_gcl_au*MTk3MjgxODE5OC4xNzI3MjU4NDM2*FPAU*MTk3MjgxODE5OC4xNzI3MjU4NDM2 | ||
share: true | ||
--- | ||
Nice, it is on `torchao` | ||
|
||
![[Pasted image 20240928133556.png|Pasted image 20240928133556.png]] | ||
|
||
![[Pasted image 20240928133618.png|Pasted image 20240928133618.png]] | ||
|
||
![[Pasted image 20240928133712.png|Pasted image 20240928133712.png]] | ||
|
||
|
||
![[Pasted image 20241001102234.png|Pasted image 20241001102234.png]] | ||
Notes: | ||
- Don't quite understand what does Core or AO mean in this context, but at least `torch.compile` is acknowledged :p | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
--- | ||
authors: | ||
- "[[Edward Z. Yang|Edward Z. Yang]]" | ||
year: 2019 | ||
tags: | ||
- blog | ||
url: http://blog.ezyang.com/2019/05/pytorch-internals/ | ||
share: true | ||
--- | ||
|
||
Depending on tensor metadata (if it's CUDA, or sparse, etc) it's dispatched to different implementations () | ||
![[Pasted image 20240928131008.png|500]] |
34 changes: 34 additions & 0 deletions
34
...erence notes/104 Other/Reinforcement Learning - An Introduction - Chapter 11.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
--- | ||
authors: | ||
- "[[Richard S. Sutton|Richard S. Sutton]]" | ||
- "[[Andrew G. Barton|Andrew G. Barton]]" | ||
year: 2018 | ||
tags: | ||
- textbook | ||
url: | ||
share: true | ||
--- | ||
## 11.5 Gradient Descent in the Bellman Error | ||
|
||
> [!NOTE] Mean-squared temporal difference error | ||
> | ||
> $$ | ||
> \begin{align} | ||
> \overline{TDE}(\mathbf{w}) &= \sum_{s \in \mathcal{S}} \mu(s) \mathbb{E}\left[\delta_t^2 \mid S_t = s, A_t \sim \pi \right] \\ | ||
> &= \sum_{s \in \mathcal{S}} \mu(s) \mathbb{E}\left[\rho_t \delta_t^2 \mid S_t = s, A_t \sim b \right] \\ | ||
> &= \mathbb{E}_b\left[\rho_t \delta_t^2 \right] | ||
> \end{align} | ||
> $$ | ||
> [!NOTE] Equation 11.23: Weight update of naive residual-gradient algoritm | ||
> | ||
> $$ | ||
> \begin{align} | ||
> \mathbf{w}_{t+1} &= \mathbf{w}_t - \frac{1}{2} \alpha \nabla(\rho_t \delta_t^2) \\ | ||
> &= \mathbf{w}_t - \alpha \rho_t \delta_t \nabla(\delta_t) \\ | ||
> &= \mathbf{w}_t - \alpha \rho_t \delta_t (\nabla \hat{v}(S_t, \mathbf{w}_t) - \gamma \nabla \hat{v}(S_{t+1}, \mathbf{w}_t)) \tag{11.23} \\ | ||
> \end{align} | ||
> $$ | ||
|
||
|
145 changes: 145 additions & 0 deletions
145
...ference notes/104 Other/Reinforcement Learning - An Introduction - Chapter 3.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,145 @@ | ||
--- | ||
authors: | ||
- "[[Richard S. Sutton|Richard S. Sutton]]" | ||
- "[[Andrew G. Barton|Andrew G. Barton]]" | ||
year: 2018 | ||
tags: | ||
- textbook | ||
url: | ||
share: true | ||
--- | ||
## 3.1 The Agent-Environment Interface | ||
|
||
|
||
> [!NOTE] Equation 3.1: Trajectory | ||
> | ||
> $$ | ||
> S_0,A_0,R_1,S_1,A_1,R_2,S_2,A_2,R_3, \dots \tag{3.1} | ||
> $$ | ||
|
||
> [!NOTE] Equation 3.2: MDP dynamics | ||
> | ||
> $$ | ||
> p(s', r \mid s, a) \doteq \Pr \{ S_t = s', R_t = r \mid S_{t-1} = s, A_{t-1} = a \} \tag{3.2} | ||
> $$ | ||
|
||
You can obtain the *state-transition probabilities* and the with the law of total probability. | ||
You can obtain the expected reward also. | ||
|
||
## 3.2 Goals and Rewards | ||
|
||
> [!FAQ]- What is the reward hypothesis? | ||
> | ||
> The reward hypothesis is the idea that **all of what we mean by goals** and purposes can be well thought of as the **maximization** of the expected value of the cumulative sum of a received scalar signal (called **reward**). | ||
|
||
- The reward signal is your way of communicating to the agent what you want it to achieve **not how you want it to achieve it**. | ||
|
||
|
||
## 3.3 Returns and Episodes | ||
|
||
> [!NOTE] Equation 3.7: Undiscounted return | ||
> | ||
> $$ | ||
> G_t \doteq R_{t+1} + R_{t+2} + R_{t+3} + \dots + R_T \tag{3.7} | ||
> $$ | ||
> [!NOTE] Equation 3.8: Discounted return | ||
> | ||
> $$ | ||
> G_t \doteq R_{t+1} + \gamma R_{t+2} + \gamma^2 R_{t+3} + \dots = \sum_{k=0}^{\infty} \gamma^k R_{t+k+1} \tag{3.8} | ||
> $$ | ||
> | ||
> Where $\gamma$ is the discount rate. | ||
|
||
> [!NOTE] Equation 3.9: Recursive definition of return | ||
> | ||
> You can group Eq 3.8 into a recursive definition of the return. | ||
> | ||
> $$ | ||
> G_t \doteq R_{t+1} + \gamma G_{t+1} \tag{3.9} | ||
> $$ | ||
|
||
## 3.4 Unified Notation for Episodic and Continuing Tasks | ||
|
||
![[Pasted image 20240928215826.png|Pasted image 20240928215826.png]] | ||
|
||
## 3.5 Policies and Value Functions | ||
|
||
A policy $\pi(a \mid s)$ is a probability distribution over actions given states. | ||
|
||
> [!NOTE] Equation 3.12: State-value function | ||
> | ||
> $$ | ||
> v_{\pi}(s) \doteq \mathbb{E}_{\pi}[G_t \mid S_t = s] \;\; \forall s \in \mathcal{S} \tag{3.12} | ||
> | ||
> $$ | ||
> [!NOTE] Equation 3.13: Action-value function | ||
> | ||
> $$ | ||
> q_{\pi}(s, a) \doteq \mathbb{E}_{\pi}[G_t \mid S_t = s, A_t = a] \;\; \forall s \in \mathcal{S}, a \in \mathcal{A} \tag{3.13} | ||
> $$ | ||
> [!NOTE] Equation 3.14: Bellman equation for $v_{\pi}$ | ||
> | ||
> $$ | ||
> \begin{align} | ||
> v_\pi(s) &\doteq \mathbb{E}_{\pi}[G_t \mid S_t = s] \\ | ||
> &= \mathbb{E}_{\pi}[R_{t+1} + \gamma G_{t+1} \mid S_t = s] \tag{by (3.9)} \\ | ||
> &= \sum_{a} \pi(a \mid s) \sum_{s', r} p(s', r \mid s, a) \left[r + \gamma \mathbb{E}_{\pi}\left[G_{t+1} \mid S_{t+1} = s'\right]\right] \\ | ||
> &= \sum_{a} \pi(a \mid s) \sum_{s', r} p(s', r \mid s, a) [r + \gamma v_\pi(s')] \tag{3.14} | ||
> \end{align} | ||
> $$ | ||
## 3.6 Optimal Policies and Optimal Value Functions | ||
|
||
> [!NOTE] Equation 3.15: Optimal state-value function | ||
> | ||
> $$ | ||
> v_*(s) \doteq \max_{\pi} v_{\pi}(s) \tag{3.15} | ||
> $$ | ||
> [!NOTE] Equation 3.16: Optimal action-value function | ||
> | ||
> $$ | ||
> q_*(s, a) \doteq \max_{\pi} q_{\pi}(s, a) \tag{3.16} | ||
> $$ | ||
> [!NOTE] Equation 3.17 | ||
> | ||
> $$ | ||
> q_*(s, a) = \mathbb{E}[R_{t+1} + \gamma v_*(S_{t+1}) \mid S_t = s, A_t = a] \tag{3.17} | ||
> $$ | ||
> [!NOTE] Equation 3.18 and 3.19: Bellman optimality equations for $v_*$ | ||
> | ||
> $$ | ||
> \begin{align} | ||
> v_*(s) &= \max_{a \in \mathcal{A}(s)} q_{\pi_*}(s, a) \\ | ||
> &= \max_{a} \mathbb{E}_{\pi_*}[G_t \mid S_t = s, A_t = a] \tag{by (3.9)}\\ | ||
> &= \max_{a} \mathbb{E}_{\pi_*}[R_{t+1} + \gamma G_{t+1} \mid S_t = s, A_t = a] \\ | ||
> &= \max_{a} \mathbb{E}[R_{t+1} + \gamma v_*(S_{t+1}) \mid S_t = s, A_t = a] \tag{3.18} \\ | ||
> &= \max_{a} \sum_{s', r} p(s', r \mid s, a) [r + \gamma v_*(s')] \tag{3.19} \\ | ||
> \end{align} | ||
> $$ | ||
> [!NOTE] Equation 3.20: Bellman optimality equation for $q_*$ | ||
> | ||
> $$ | ||
> \begin{align} | ||
> q_*(s, a) &= \mathbb{E}[R_{t+1} + \gamma \max_{a'} q_*(S_{t+1}, a') \mid S_t = s, A_t = a] \\ | ||
> &= \sum_{s', r} p(s', r \mid s, a) [r + \gamma \max_{a'} q_*(s', a')] \tag{3.20} | ||
> \end{align} | ||
> $$ | ||
|
||
**Any policy that is greedy with respect to the optimal evaluation function $v_*$ is an optimal policy.** | ||
|
||
|
||
|
||
|
73 changes: 73 additions & 0 deletions
73
...ference notes/104 Other/Reinforcement Learning - An Introduction - Chapter 5.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
--- | ||
authors: | ||
- "[[Richard S. Sutton|Richard S. Sutton]]" | ||
- "[[Andrew G. Barton|Andrew G. Barton]]" | ||
year: 2018 | ||
tags: | ||
- textbook | ||
url: | ||
share: true | ||
--- | ||
## 5.1 Monte Carlo prediction | ||
|
||
first-visit mc | ||
- independence assumptions, easier theoretically | ||
every-visit mc | ||
|
||
- [ ] TODO: finish notes | ||
## 5.4 Monte Carlo Control without Exploring Starts | ||
|
||
- $\epsilon-$greedy policy | ||
- All non-greedy actions have minimum probability of $\frac{\epsilon}{|\mathcal{A}|}$ | ||
- Greedy action has probability $(1 - \epsilon) + \frac{\epsilon}{|\mathcal{A}|}$ | ||
|
||
- [ ] TODO: finish notes | ||
|
||
## 5.5 Off-policy Prediction via Importance Sampling | ||
|
||
Given a starting state $S_t$, the probability of the subsequent state-action trajectory, $A_t, S_{t+1}, A_{t+1}, \dots, S_T$, under the policy $\pi$ is given by: | ||
|
||
$$ | ||
\begin{align} | ||
Pr\{A_t, S_{t+1}, A_{t+1}, \dots, S_T \mid S_t, A_{t:T-1} \sim \pi\} & = \prod_{k=t}^{T-1} \pi(A_k \mid S_k) p(S_{k+1} \mid S_k, A_k) | ||
\end{align} | ||
$$ | ||
|
||
|
||
> [!NOTE] Equation 5.3: Important sampling ratio | ||
> | ||
> $$ | ||
> \rho_{t:T-1} \doteq \frac{\prod_{k=t}^{T-1} \pi(A_k \mid S_k) p(S_{k+1} \mid S_k, A_k)}{\prod_{k=t}^{T-1} b(A_k \mid S_k) p(S_{k+1} \mid S_k, A_k)} = \prod_{k=t}^{T-1} \frac{\pi(A_k \mid S_k)}{b(A_k \mid S_k)} \tag{5.3} | ||
> $$ | ||
> [!NOTE] Equation 5.4: Value function for target function $\pi$ under behavior policy $b$ | ||
> | ||
> The importance sampling ratio allows us to compute the correct expected value to compute $v_\pi$: | ||
> | ||
> $$ | ||
> \begin{align} | ||
> v_\pi(s) &\doteq \mathbb{E}_b[\rho_{t:T - 1}G_t \mid S_t = s] \tag{5.4} \\ | ||
> \end{align} | ||
> $$ | ||
> [!NOTE] Equation 5.5: Ordinary importance sampling | ||
> | ||
> $$ | ||
> V(s) \doteq \frac{\sum_{t \in \mathcal{T}(s)} \rho_{t:T-1} G_t}{|\mathcal{T}(s)|} \tag{5.5} | ||
> $$ | ||
> [!NOTE] Equation 5.6: Weighted importance sampling | ||
> | ||
> $$ | ||
> V(s) \doteq \frac{\sum_{t \in \mathcal{T}(s)} \rho_{t:T-1} G_t}{\sum_{t \in \mathcal{T}(s)} \rho_{t:T-1}} \tag{5.6} | ||
> $$ | ||
![[Pasted image 20240929183258.png|Pasted image 20240929183258.png]] | ||
|
||
In practice, weighted importance sampling has much lower error at the beginning. | ||
|
||
|
||
## 5.6 Incremental Implementation | ||
|
||
#todo | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
26 changes: 26 additions & 0 deletions
26
...ference notes/104 Other/Reinforcement Learning - An Introduction - Chapter 7.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
--- | ||
authors: | ||
- "[[Richard S. Sutton|Richard S. Sutton]]" | ||
- "[[Andrew G. Barton|Andrew G. Barton]]" | ||
year: 2018 | ||
tags: | ||
- textbook | ||
url: | ||
share: true | ||
--- | ||
## 7.1 $n$-step TD prediction | ||
|
||
One-step return: | ||
|
||
$$ | ||
G_{t:t+1} \doteq R_{t+1} + \gamma V_t(S_{t+1}) | ||
$$ | ||
|
||
> [!NOTE] Equation 7.1: $n$-step return | ||
> | ||
> $$ | ||
> G_{t:t+n} \doteq R_{t+1} + \gamma R_{t+2} + \dots + \gamma^{n-1} R_{t+n} + \gamma^n V_{t + n - 1}(S_{t+n}) \tag{7.1} | ||
> $$ | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.