[PUBLISHER] Merge #44

* PUSH NOTE : GPU mode - Sparsity.md * PUSH NOTE : Reinforcement Learning - An Introduction - Chapter 11.md
dgcnz · Oct 1, 2024 · 28a897f · 28a897f
1 parent a8f283c
commit 28a897f
Show file tree

Hide file tree

Showing 2 changed files with 50 additions and 0 deletions.
diff --git a/docs/100 Reference notes/104 Other/GPU mode - Sparsity.md b/docs/100 Reference notes/104 Other/GPU mode - Sparsity.md
@@ -0,0 +1,13 @@
+---
+authors:
+  - "[[Jesse Cai|Jesse Cai]]"
+year: 2024
+tags:
+  - lecture
+  - presentation
+url: https://github.com/gpu-mode/lectures/blob/main/lecture_011/sparsity.pptx
+share: true
+---
+### Notes
+- [ ] #todo take notes
+
diff --git a/...erence notes/104 Other/Reinforcement Learning - An Introduction - Chapter 11.md b/...erence notes/104 Other/Reinforcement Learning - An Introduction - Chapter 11.md
@@ -8,6 +8,43 @@ tags:
 url: 
 share: true
 ---
+## 11.1 Semi-gradient Methods
+
+> [!NOTE] Equation 11.1: Per-step importance sampling ratio
+> 
+> $$
+> \rho_t \doteq \rho_{t:T-1} = \frac{\pi(A_t \mid S_t)}{b(A_t \mid S_t)}
+> $$
+
+
+#todo
+
+## 11.4 Linear Value-function Geometry
+
+TODO:
+- [x] 11.11 mu norm equation ✅ 2024-10-01
+- [x] 11.17 and 11.18 bellman error ✅ 2024-10-01
+- [ ] 11.19 mean square bellman error
+
+> [!NOTE] Equation 11.11: $\mu$-norm
+> 
+> $$
+> ||\mathbf{v}||^2_\mu \doteq \sum_{s \in \mathcal{S}} \mu(s) v(s)^2
+> $$
+
+> [!NOTE] Equation 11.17 and 11.18: Bellman error
+> 
+> $$
+> \begin{align}
+> \bar{\delta}_{\mathbf{w}}(s) &\doteq \left( \sum_a \pi(a \mid s) \sum_{s', r} p(s', r \mid s, a)[r + \gamma v_{\mathbf{w}}(s')] \right) - v_{\mathbf{w}}(s) \tag{11.17} \\
+> &= \mathbb{E}_\pi[R_{t+1} - \gamma v_{\mathbf{w}}(S_{t+1}) - v_{\mathbf{w}}(S_{t}) \mid S_t = s, A_t \sim \pi]
+> \end{align}
+> $$
+
+
+
+
+
 ## 11.5 Gradient Descent in the Bellman Error
 
 > [!NOTE] Mean-squared temporal difference error