Skip to content

Commit

Permalink
html update
Browse files Browse the repository at this point in the history
  • Loading branch information
liaoruowang committed May 20, 2017
1 parent 349f74c commit fa28d76
Show file tree
Hide file tree
Showing 3 changed files with 15 additions and 15 deletions.
4 changes: 2 additions & 2 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
TEMPDIR := $(shell mktemp -d -t tmp)
TEMPDIR := $(shell mktemp -d -t tmp.XXX)

publish:
echo 'hmmm'
Expand All @@ -8,5 +8,5 @@ publish:
git init && \
git add . && \
git commit -m 'publish site' && \
git remote add origin git@github.com:ermongroup/cs228-notes.git && \
git remote add origin https://github.com/ermongroup/cs228-notes.git && \
git push origin master:refs/heads/gh-pages --force
22 changes: 11 additions & 11 deletions docs/preliminaries/probabilityreview/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -186,10 +186,10 @@ <h2 id="22-probability-mass-functions">2.2 Probability mass functions</h2>
<p>When a random variable X takes on a finite set of possible values (i.e., X is a discrete random
variable), a simpler way to represent the probability measure associated with a random variable is
to directly specify the probability of each value that the random variable can assume. In particular,
a probability mass function (PMF) is a function <script type="math/tex">p_X : \Omega \rightarrow I\!R</script> such that
a probability mass function (PMF) is a function <script type="math/tex">pX : \Omega \rightarrow I\!R</script> such that
<script type="math/tex">p_X(x) = P(X = x)</script>.</p>

<p>In the case of discrete random variable, we use the notation Val(X) for the set of possible values that the random variable X may assume. For example, if X(ω) is a random variable indicating the number of heads out of ten tosses of coin, then Val(X) = {0, 1, 2, . . . , 10}.</p>
<p>In the case of discrete random variable, we use the notation Val(X) for the set of possible values that the random variable X may assume. For example, if X(ω) is a random variable indicating the number of heads out of ten tosses of coin, then V al(X) = {0, 1, 2, . . . , 10}.</p>

<h3 id="properties-2"><strong>Properties</strong>:</h3>
<ul>
Expand All @@ -199,19 +199,19 @@ <h3 id="properties-2"><strong>Properties</strong>:</h3>
</ul>

<h2 id="23-probability-density-functions">2.3 Probability density functions</h2>
<p>For some continuous random variables, the cumulative distribution function FX(x) is differentiable everywhere. In these cases, we define the Probability Density Function or PDF as the derivative of the CDF, i.e.,</p>
<p>For some continuous random variables, the cumulative distribution function <script type="math/tex">F_X(x)</script> is differentiable everywhere. In these cases, we define the Probability Density Function or PDF as the derivative of the CDF, i.e.,</p>

<p>\begin{equation}
f_X(x) = \frac{dF_X(x)}{dx}.
\end{equation}</p>

<p>Note here, that the PDF for a continuous random variable may not always exist (i.e., if FX(x) is not differentiable everywhere).</p>
<p>Note here, that the PDF for a continuous random variable may not always exist (i.e., if <script type="math/tex">F_X(x)</script> is not differentiable everywhere).</p>

<p>According to the <strong>properties</strong> of differentiation, for very small ∆x,</p>

<p><script type="math/tex">P(x \leq X \leq x + \delta x) f_X(x)\delta x</script>.</p>

<p>Both CDFs and PDFs (when they exist!) can be used for calculating the probabilities of different events. But it should be emphasized that the value of PDF at any given point <script type="math/tex">x</script> is not the probability of that event, i.e., <script type="math/tex">f_X(x) \neq P(X = x)</script>. For example, <script type="math/tex">f_X(x)</script> can take on values larger than one (but the integral of fX(x) over any subset of R will be at most one).</p>
<p>Both CDFs and PDFs (when they exist!) can be used for calculating the probabilities of different events. But it should be emphasized that the value of PDF at any given point <script type="math/tex">x</script> is not the probability of that event, i.e., <script type="math/tex">f_X(x) \neq P(X = x)</script>. For example, <script type="math/tex">f_X(x)</script> can take on values larger than one (but the integral of <script type="math/tex">f_X(x)</script> over any subset of R will be at most one).</p>

<h3 id="properties-3"><strong>Properties</strong>:</h3>
<ul>
Expand All @@ -231,8 +231,8 @@ <h2 id="24-expectation">2.4 Expectation</h2>
<p>If X is a continuous random variable with PDF <script type="math/tex">f_X(x)</script>, then the expected value of g(X) is defined as,</p>

<p>\begin{equation}
E[g(X)] = \int^{\infty}_{-\infty} g(x)f_X(x)dx.
\end{equation}</p>
E[g(X)] = \int^{\infty}_{-\infty} g(x)f_X(x)dx
\end{equation}.</p>

<p>Intuitively, the expectation of g(X) can be thought of as a “weighted average” of the values that g(x) can taken on for different values of x, where the weights are given by <script type="math/tex">p_X(x)</script> or <script type="math/tex">f_X(x)</script>. As a special case of the above, note that the expectation, E[X] of a random variable itself is found by letting g(x) = x; this is also known as the mean of the random variable X.</p>

Expand Down Expand Up @@ -293,8 +293,8 @@ <h3 id="discrete-random-variables">Discrete random variables</h3>
<p><strong><script type="math/tex">X</script> ∼ Bernoulli(p)</strong> (where <script type="math/tex">0 \leq p \leq 1</script>): one if a coin with heads probability p comes up heads, zero otherwise.</p>
<div class="mathblock"><script type="math/tex; mode=display">
p(x)=\begin{cases}
p, & \text{if $$p = 1$$}. \\
1-p, & \text{if $$p = 0$$}.
p, & \text{if $$x = 1$$}. \\
1-p, & \text{if $$x = 0$$}.
\end{cases}
</script></div>

Expand Down Expand Up @@ -360,7 +360,7 @@ <h3 id="properties-6"><strong>Properties</strong>:</h3>

<h2 id="32-joint-and-marginal-probability-mass-functions">3.2 Joint and marginal probability mass functions</h2>

<p>If X and Y are discrete random variables, then the joint probability mass function <script type="math/tex">p_{XY} : I\!R \prod \!R \rightarrow [0, 1]</script> is defined by
<p>If X and Y are discrete random variables, then the joint probability mass function <script type="math/tex">p_{XY} : I\!R \times \!R \rightarrow [0, 1]</script> is defined by
\begin{equation}
p_{XY}(x, y) = P(X = x, Y = y).
\end{equation}
Expand Down Expand Up @@ -388,7 +388,7 @@ <h2 id="33-joint-and-marginal-probability-density-functions">3.3 Joint and margi
\begin{equation}
\int \int_{x \in A} f_{XY} (x, y)dx dy = P((X, Y ) \in A).
\end{equation}
Note that the values of the probability density function f_{XY}(x, y) are always nonnegative, but they
Note that the values of the probability density function <script type="math/tex">f_{XY}(x, y)</script> are always nonnegative, but they
may be greater than 1. Nonetheless, it must be the case that <script type="math/tex">\int^{\infty}_{-\infty} \int^{\infty}_{-\infty} f_{XY}(x,y) = 1</script></p>

<p>Analagous to the discrete case, we define</p>
Expand Down
4 changes: 2 additions & 2 deletions docs/representation/directed/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ <h3 id="graphical-representation">Graphical representation.</h3>

<p>As an example, consider a model of a student’s grade <script type="math/tex">g</script> on an exam; this grade depends on several factors: the exam’s difficulty <script type="math/tex">d</script>, the student’s intelligence <script type="math/tex">i</script>, his SAT score <script type="math/tex">s</script>; it also affects the quality <script type="math/tex">l</script> of the reference letter from the professor who taught the course. Each variable is binary, except for <script type="math/tex">g</script>, which takes 3 possible values.<label for="nb1" class="margin-toggle"></label><input type="checkbox" id="nb1" class="margin-toggle" /><span class="marginnote"><img class="fullwidth" src="/cs228-notes/assets/img/grade-model.png" /><br />Bayes net model describing the performance of a student on an exam. The distribution can be represented a product of conditional probability distributions specified by tables. The form of these distributions is described by edges in the graph.</span> The joint probability distribution over the 5 variables naturally factorizes as follows:</p>
<div class="mathblock"><script type="math/tex; mode=display">
p(l, g, i, d, s) = p(l \mid g) p(g \mid i, d) p(i) p(d) p(s\mid d).
p(l, g, i, d, s) = p(l \mid g) p(g \mid i, d) p(i) p(d) p(s\mid i).
</script></div>
<p>The graphical representation of this distribution is a DAG that visually specifies how random variables depend on each other. The graph clearly indicates that the letter depends on the grade, which in turn depends on the student’s intelligence and the difficulty of the exam.</p>

Expand Down Expand Up @@ -162,7 +162,7 @@ <h3 id="independencies-described-by-directed-graphs">Independencies described by
<li><em>V-structure</em> (also known as <em>explaining away</em>): If <script type="math/tex">G</script> is <script type="math/tex">A \rightarrow C \leftarrow B</script>, then knowing <script type="math/tex">C</script> couples <script type="math/tex">A</script> and <script type="math/tex">B</script>. In other words, <script type="math/tex">A \perp B</script> if <script type="math/tex">C</script> is unobserved, but <span><script type="math/tex">A \not\perp B \mid C</script></span> if <script type="math/tex">C</script> is observed.</li>
</ul>

<p>The latter case requires additional explanation. Suppose that <script type="math/tex">C</script> is a Boolean variable that indicates whether our lawn is wet one morning; <script type="math/tex">A</script> and <script type="math/tex">B</script> are two explanation for it being wet: either it rained (indicated by <script type="math/tex">A</script>), or the sprinkler turned on (indicated by <script type="math/tex">B</script>). If we know that the grass is wet (<script type="math/tex">C</script> is true) and the sprinkler didn’t go on (<script type="math/tex">B</script> is false), then the probability that <script type="math/tex">A</script> is true must be one, because that is the only other possible explanation. Hence, <script type="math/tex">C</script> and <script type="math/tex">A</script> are not independent given <script type="math/tex">B</script>.</p>
<p>The latter case requires additional explanation. Suppose that <script type="math/tex">C</script> is a Boolean variable that indicates whether our lawn is wet one morning; <script type="math/tex">A</script> and <script type="math/tex">B</script> are two explanations for it being wet: either it rained (indicated by <script type="math/tex">A</script>), or the sprinkler turned on (indicated by <script type="math/tex">B</script>). If we know that the grass is wet (<script type="math/tex">C</script> is true) and the sprinkler didn’t go on (<script type="math/tex">B</script> is false), then the probability that <script type="math/tex">A</script> is true must be one, because that is the only other possible explanation. Hence, <script type="math/tex">A</script> and <script type="math/tex">B</script> are not independent given <script type="math/tex">C</script>.</p>

<p>These structures clearly describe the independencies encoded by a three-variable Bayesian net. We can extend them to general networks by applying them recursively over any larger graph. This leads to a notion called <script type="math/tex">d</script>-separation (where <script type="math/tex">d</script> stands for directed).</p>

Expand Down

0 comments on commit fa28d76

Please sign in to comment.