html update

villanuevab · May 20, 2017 · fa28d76 · fa28d76
1 parent 349f74c
commit fa28d76
Show file tree

Hide file tree

Showing 3 changed files with 15 additions and 15 deletions.
diff --git a/docs/Makefile b/docs/Makefile
@@ -1,4 +1,4 @@
-TEMPDIR := $(shell mktemp -d -t tmp)
+TEMPDIR := $(shell mktemp -d -t tmp.XXX)
 
 publish:
 	echo 'hmmm'
@@ -8,5 +8,5 @@ publish:
 	git init && \
 	git add . && \
 	git commit -m 'publish site' && \
-	git remote add origin git@github.com:ermongroup/cs228-notes.git && \
+	git remote add origin https://github.com/ermongroup/cs228-notes.git && \
 	git push origin master:refs/heads/gh-pages --force
diff --git a/docs/preliminaries/probabilityreview/index.html b/docs/preliminaries/probabilityreview/index.html
@@ -186,10 +186,10 @@ <h2 id="22-probability-mass-functions">2.2 Probability mass functions</h2>
 <p>When a random variable X takes on a finite set of possible values (i.e., X is a discrete random
 variable), a simpler way to represent the probability measure associated with a random variable is
 to directly specify the probability of each value that the random variable can assume. In particular,
-a probability mass function (PMF) is a function <script type="math/tex">p_X : \Omega \rightarrow I\!R</script> such that 
+a probability mass function (PMF) is a function <script type="math/tex">pX : \Omega \rightarrow I\!R</script> such that 
 <script type="math/tex">p_X(x) = P(X = x)</script>.</p>
 
-<p>In the case of discrete random variable, we use the notation Val(X) for the set of possible values that the random variable X may assume. For example, if X(ω) is a random variable indicating the number of heads out of ten tosses of coin, then Val(X) = {0, 1, 2, . . . , 10}.</p>
+<p>In the case of discrete random variable, we use the notation Val(X) for the set of possible values that the random variable X may assume. For example, if X(ω) is a random variable indicating the number of heads out of ten tosses of coin, then V al(X) = {0, 1, 2, . . . , 10}.</p>
 
 <h3 id="properties-2"><strong>Properties</strong>:</h3>
 <ul>
@@ -199,19 +199,19 @@ <h3 id="properties-2"><strong>Properties</strong>:</h3>
 </ul>
 
 <h2 id="23-probability-density-functions">2.3 Probability density functions</h2>
-<p>For some continuous random variables, the cumulative distribution function FX(x) is differentiable everywhere. In these cases, we define the Probability Density Function or PDF as the derivative of the CDF, i.e.,</p>
+<p>For some continuous random variables, the cumulative distribution function <script type="math/tex">F_X(x)</script> is differentiable everywhere. In these cases, we define the Probability Density Function or PDF as the derivative of the CDF, i.e.,</p>
 
 <p>\begin{equation}
 f_X(x)  = \frac{dF_X(x)}{dx}.
 \end{equation}</p>
 
-<p>Note here, that the PDF for a continuous random variable may not always exist (i.e., if FX(x) is not differentiable everywhere).</p>
+<p>Note here, that the PDF for a continuous random variable may not always exist (i.e., if <script type="math/tex">F_X(x)</script> is not differentiable everywhere).</p>
 
 <p>According to the <strong>properties</strong> of differentiation, for very small ∆x,</p>
 
 <p><script type="math/tex">P(x \leq X \leq x + \delta x) ≈ f_X(x)\delta x</script>.</p>
 
-<p>Both CDFs and PDFs (when they exist!) can be used for calculating the probabilities of different events. But it should be emphasized that the value of PDF at any given point <script type="math/tex">x</script> is not the probability of that event, i.e., <script type="math/tex">f_X(x) \neq P(X = x)</script>. For example, <script type="math/tex">f_X(x)</script> can take on values larger than one (but the integral of fX(x) over any subset of R will be at most one).</p>
+<p>Both CDFs and PDFs (when they exist!) can be used for calculating the probabilities of different events. But it should be emphasized that the value of PDF at any given point <script type="math/tex">x</script> is not the probability of that event, i.e., <script type="math/tex">f_X(x) \neq P(X = x)</script>. For example, <script type="math/tex">f_X(x)</script> can take on values larger than one (but the integral of <script type="math/tex">f_X(x)</script> over any subset of R will be at most one).</p>
 
 <h3 id="properties-3"><strong>Properties</strong>:</h3>
 <ul>
@@ -231,8 +231,8 @@ <h2 id="24-expectation">2.4 Expectation</h2>
 <p>If X is a continuous random variable with PDF <script type="math/tex">f_X(x)</script>, then the expected value of g(X) is defined as,</p>
 
 <p>\begin{equation}
-E[g(X)] = \int^{\infty}_{-\infty} g(x)f_X(x)dx.
-\end{equation}</p>
+E[g(X)] = \int^{\infty}_{-\infty} g(x)f_X(x)dx
+\end{equation}.</p>
 
 <p>Intuitively, the expectation of g(X) can be thought of as a “weighted average” of the values that g(x) can taken on for different values of x, where the weights are given by <script type="math/tex">p_X(x)</script> or <script type="math/tex">f_X(x)</script>. As a special case of the above, note that the expectation, E[X] of a random variable itself is found by letting g(x) = x; this is also known as the mean of the random variable X.</p>
 
@@ -293,8 +293,8 @@ <h3 id="discrete-random-variables">Discrete random variables</h3>
 <p>• <strong><script type="math/tex">X</script> ∼ Bernoulli(p)</strong> (where <script type="math/tex">0 \leq p \leq 1</script>): one if a coin with heads probability p comes up heads, zero otherwise.</p>
 <div class="mathblock"><script type="math/tex; mode=display">
 p(x)=\begin{cases}
-p, & \text{if $$p = 1$$}. \\
-1-p, & \text{if $$p = 0$$}.
+p, & \text{if $$x = 1$$}. \\
+1-p, & \text{if $$x = 0$$}.
 \end{cases}
 </script></div>
 
@@ -360,7 +360,7 @@ <h3 id="properties-6"><strong>Properties</strong>:</h3>
 
 <h2 id="32-joint-and-marginal-probability-mass-functions">3.2 Joint and marginal probability mass functions</h2>
 
-<p>If X and Y are discrete random variables, then the joint probability mass function <script type="math/tex">p_{XY} : I\!R \prod \!R \rightarrow [0, 1]</script> is defined by
+<p>If X and Y are discrete random variables, then the joint probability mass function <script type="math/tex">p_{XY} : I\!R \times \!R \rightarrow [0, 1]</script> is defined by
 \begin{equation}
 p_{XY}(x, y) = P(X = x, Y = y).
 \end{equation}
@@ -388,7 +388,7 @@ <h2 id="33-joint-and-marginal-probability-density-functions">3.3 Joint and margi
 \begin{equation}
 \int \int_{x \in A} f_{XY} (x, y)dx dy = P((X, Y ) \in A).
 \end{equation}
-Note that the values of the probability density function f_{XY}(x, y) are always nonnegative, but they
+Note that the values of the probability density function <script type="math/tex">f_{XY}(x, y)</script> are always nonnegative, but they
 may be greater than 1. Nonetheless, it must be the case that <script type="math/tex">\int^{\infty}_{-\infty} \int^{\infty}_{-\infty} f_{XY}(x,y) = 1</script></p>
 
 <p>Analagous to the discrete case, we define</p>

diff --git a/docs/representation/directed/index.html b/docs/representation/directed/index.html
@@ -115,7 +115,7 @@ <h3 id="graphical-representation">Graphical representation.</h3>
 
 <p>As an example, consider a model of a student’s grade <script type="math/tex">g</script> on an exam; this grade depends on several factors: the exam’s difficulty <script type="math/tex">d</script>, the student’s intelligence <script type="math/tex">i</script>, his SAT score <script type="math/tex">s</script>; it also affects the quality <script type="math/tex">l</script> of the reference letter from the professor who taught the course. Each variable is binary, except for <script type="math/tex">g</script>, which takes 3 possible values.<label for="nb1" class="margin-toggle">⊕</label><input type="checkbox" id="nb1" class="margin-toggle" /><span class="marginnote"><img class="fullwidth" src="/cs228-notes/assets/img/grade-model.png" /><br />Bayes net model describing the performance of a student on an exam. The distribution can be represented a product of conditional probability distributions specified by tables. The form of these distributions is described by edges in the graph.</span> The joint probability distribution over the 5 variables naturally factorizes as follows:</p>
 <div class="mathblock"><script type="math/tex; mode=display">
-p(l, g, i, d, s) = p(l \mid  g) p(g \mid  i, d) p(i) p(d) p(s\mid d).
+p(l, g, i, d, s) = p(l \mid  g) p(g \mid  i, d) p(i) p(d) p(s\mid i).
 </script></div>
 <p>The graphical representation of this distribution is a DAG that visually specifies how random variables depend on each other. The graph clearly indicates that the letter depends on the grade, which in turn depends on the student’s intelligence and the difficulty of the exam.</p>
 
@@ -162,7 +162,7 @@ <h3 id="independencies-described-by-directed-graphs">Independencies described by
   <li><em>V-structure</em> (also known as <em>explaining away</em>): If <script type="math/tex">G</script> is <script type="math/tex">A \rightarrow C \leftarrow B</script>, then knowing <script type="math/tex">C</script> couples <script type="math/tex">A</script> and <script type="math/tex">B</script>. In other words, <script type="math/tex">A \perp B</script> if <script type="math/tex">C</script> is unobserved, but <span><script type="math/tex">A \not\perp B \mid  C</script></span> if <script type="math/tex">C</script> is observed.</li>
 </ul>
 
-<p>The latter case requires additional explanation. Suppose that <script type="math/tex">C</script> is a Boolean variable that indicates whether our lawn is wet one morning; <script type="math/tex">A</script> and <script type="math/tex">B</script> are two explanation for it being wet: either it rained (indicated by <script type="math/tex">A</script>), or the sprinkler turned on (indicated by <script type="math/tex">B</script>). If we know that the grass is wet (<script type="math/tex">C</script> is true) and the sprinkler didn’t go on (<script type="math/tex">B</script> is false), then the probability that <script type="math/tex">A</script> is true must be one, because that is the only other possible explanation. Hence, <script type="math/tex">C</script> and <script type="math/tex">A</script> are not independent given <script type="math/tex">B</script>.</p>
+<p>The latter case requires additional explanation. Suppose that <script type="math/tex">C</script> is a Boolean variable that indicates whether our lawn is wet one morning; <script type="math/tex">A</script> and <script type="math/tex">B</script> are two explanations for it being wet: either it rained (indicated by <script type="math/tex">A</script>), or the sprinkler turned on (indicated by <script type="math/tex">B</script>). If we know that the grass is wet (<script type="math/tex">C</script> is true) and the sprinkler didn’t go on (<script type="math/tex">B</script> is false), then the probability that <script type="math/tex">A</script> is true must be one, because that is the only other possible explanation. Hence, <script type="math/tex">A</script> and <script type="math/tex">B</script> are not independent given <script type="math/tex">C</script>.</p>
 
 <p>These structures clearly describe the independencies encoded by a three-variable Bayesian net. We can extend them to general networks by applying them recursively over any larger graph. This leads to a notion called <script type="math/tex">d</script>-separation (where <script type="math/tex">d</script> stands for directed).</p>