weiner.tex

\documentclass[11pt, oneside]{article}   	% use "amsart" instead of "article" for AMSLaTeX format
\usepackage{graphicx}
\newcommand{\problem}[1]{%\addtocounter{problemc}{1}
\item {#1}
}
\newcommand{\probl}[1]{\label{#1}}
\def\be{\begin{equation}}
\def\ee{\end{equation}}
\def\bea{\begin{eqnarray}}
\def\eea{\end{eqnarray}}
\newcommand{\vs}{\nonumber\\}
\def\across{a^\times}
\def\tcross{T^\times}
\def\ccross{C^\times}
\newcommand{\ec}[1]{Eq.~(\ref{eq:#1})}
\newcommand{\eec}[2]{Eqs.~(\ref{eq:#1}) and (\ref{eq:#2})}
\newcommand{\Ec}[1]{(\ref{eq:#1})}
\newcommand{\eql}[1]{\label{eq:#1}}
\newcommand{\sfig}[2]{
\includegraphics[width=#2]{#1}
        }
\newcommand{\sfigr}[2]{
\includegraphics[angle=270,origin=c,width=#2]{#1}
        }
\newcommand{\sfigra}[2]{
\includegraphics[angle=90,origin=c,width=#2]{#1}
        }
\newcommand{\Sfig}[2]{
   \begin{figure}[thbp]
   \begin{center}
    \sfig{#1.pdf}{0.5\columnwidth}
    \caption{{\small #2}}
    \label{fig:#1}
     \end{center}
   \end{figure}
}
\newcommand{\Spng}[2]{
   \begin{figure}[thbp]
   \begin{center}
    \sfig{#1.png}{0.7\columnwidth}
    \caption{{\small #2}}
    \label{fig:#1}
     \end{center}
   \end{figure}
}
\newcommand{\Stwo}[3]{
   \begin{figure}[thbp]
   \begin{center}
    \sfig{#1.png}{0.45\columnwidth}
    \sfig{#2.png}{0.45\columnwidth}
    \caption{{\small #3}}
    \label{fig:#1}
     \end{center}
   \end{figure}
}

\newcommand\dirac{\delta_D}
\newcommand{\rf}[1]{\ref{fig:#1}}
\newcommand\rhoc{\rho_{\rm cr}}
\newcommand\zs{D_S}
\newcommand\dts{\Delta t_{\rm Sh}}
\newcommand\zle{D_L}
\newcommand\zsl{D_{SL}}
\newcommand\sh{\gamma}
\newcommand\surb{\mathcal{S}}
\newcommand\psf{\mathcal{P}}
\newcommand\spsf{\sigma_{\rm PSF}}
\newcommand\bei{\begin{itemize}}
\newcommand\eei{\end{itemize}}

%SetFonts

%SetFonts


\title{Weiner Filter}
%\author{The Author}
%\date{}							% Activate to display a given date or no date

\begin{document}
\maketitle
\section{Gaussian Likelihood}
Suppose the data is modeled as signal plus noise, each of which is drawn from a Gaussian distribution.
Then,
\be
\mathcal{L} = P(d|C_s,C_n) = (2\pi C)^{-1/2} \exp\left\{-\frac12 \frac{d^2}{C} \right\}\eql{intl}\ee
where $C=C_s+C_n$ and $d^2/C$ is shorthand for $d^tC^{-1} d$.
'This leads to a maximum likelihood estimate for the power spectrum $C_s$ by minimizing the likelihood. Moving to Fourier space and assuming everything is diagonal leads to
\be
\mathcal{L} \propto \Pi_l  (C_l+N_l)^{-2l+1/2} e^{-\sum_m\, |d_{lm}|^2/2(C_l+N_l)}.
\ee
Maximizing this with respect to $C_l$ leads to an estimator
\be
\hat C_l = \sum_m \frac{|d_{lm}|^2}{2l+1} - N_l.\eql{powint}\ee
%\subsection{}

The likelihood in \ec{intl} implicitly integrated over all pixel values $s_{lm}$. The full conditional probability (using Bayes) is
\be
P(s,C_s|d,C_n) \propto \, (2\pi C_s)^{-1/2}\, \exp\left\{-\frac12 \frac{[d-s]^2}{C_n} \right\}
e^{-s^2/2C_s}.
\ee
When this is maximized, we find
\bea
s &=& \frac{C_s}{C_s+C_n}\, d\eql{swein}\\
C_s &=& s^2\eql{2dlike}
\eea
or more carefully in Fourier space
\bea
\hat s_{lm} &=& \frac{C_l}{C_l+N_l}\, d_{lm}\vs
\hat C_l &=& \sum_{m} \frac{|s_{lm}|^2}{2l+1}
.\eea
We immediately see the problem: the maximum value of the likelihood occurs at values of $s$ such that
\be
\langle \hat C_l\rangle = \frac{C_l}{C_l+N_l}\, C_l
\ee
lower than the true value. This is the well-known issue with the Weiner filter:
%In Fourier space, the estimate of the map from data $d_{lm}$ can be obtained using the Weiner filter:
%\begin{equation}
%\hat s_{lm} = \frac{C_l}{C_l+N_l}\, d_{lm}
%\end{equation}
%where $C_l$ is the power spectrum of the signal and $N_l$ of the noise. We are assuming that $d=s+n$, each of which has mean zero, so
%\begin{equation}
%\langle d_{lm} d^*_{l'm'} \rangle = \delta_{ll'}\delta_{mm'}\, \left( C_l+ N_l\right).
%\end{equation}
%From this, it is easy to see that 
%\begin{equation}
%\langle \hat s_{lm}\hat s^*_{l'm'}\rangle = \delta_{ll'}\delta_{mm'}\, S_l
%\end{equation}
%with the power spectrum of the estimated signal 
%\begin{equation}
%S_l = \frac{C_l}{C_l+N_l}\, C_l.\end{equation}
on small scales where noise kicks in, the estimated power spectrum is suppressed.


\subsection{Single Pixel Example}

To understand this, consider the simplest case possible of one data point with known $C_n$. There are several ways to implement this:
\bei
\item {\it Traditional Weiner Filter:} This implements \ec{swein} assuming the $C_s$ is known. Let's see how this works for 100 draws, how closely the Weiner filtered signal matches the true signal. Fig.~\rf{swein} shows that it performs well as expect, no matter what the signal to noise ratio. The estimated value of the signal has a mean equal to the true value and the spread is given by noise, independent of how large the signal to noise is.
\Spng{swein}{Histogram of the Weiner filter estimate of the signal minus the true signal for 100 draws. Different histograms show two different values of signal to noise. In each case, the Weiner filter is constructed by using the true values of $C_s$ and $C_n$.}
However, while the signal at the map level is correct, if we were to try to estimate the power spectrum from this map, we would fail.
\Spng{wpow}{A histogram of the estimated variance of the Weiner filtered map in this 1-pixel case: $\hat C_s=s_{Weiner}^2$ compared to the true value of $C_s$. In both cases, the estimated power is low on average. The mean ratio is 0.5 and 0.7 in the $C_s=(1,4)C_n$ cases.}
Fig.~\rf{wpow} shows that the extracted variance is smaller than the true variance. Note that the situation improves as the signal to noise gets larger, with the mean estimated variance about 50\% closer to the truth in the $C_s=4C_n$ case.
\item {\it Correct Power Spectrum Estimate:} We know how to estimate the variance correctly. We simply use the equivalent of \ec{powint}, in this case simply: $\hat C_s=d^2-C_n$.
\Spng{pow}{Same as Fig.~\rf{wpow} but this time using the estimate in \ec{powint}. The mean of the estimates are close to one in each case.}
Fig.~\rf{pow} shows that this estimate recovers the true power on average, albeit with significant noise.
\eei

Let's go back to the 2 equations for $(s,C_s$) and find the minimum by solving. Substituting for $s$ leads to
\be
1 = d^2 \,\frac{C_s}{(C_s+C_n)^2}.\ee
The minimum is obtained by solving the quadratic equation
\be
C_s^2 + C_s\left[ 2C_n -d^2\right] +C_n^2 = 0.
\ee
If $d^2<4C_n$, then there is no minimum for positive $C_s$ so the max like is at $s=C_s=0$. If $d$ is larger than this, then the minimum is at 
\be
C_s = \frac12 \left[ d^2-2C_n\right] \pm \frac{d}2\sqrt{d^2-4C_n}
.\ee
So in our example with $d=2C_n^{1/2}$, this leads to a maximum likelihood at $C_s=1$ and $s=1$.

Fig.~\rf{ss} shows the maximum likelihood values of $s$ and $C_s$ for 100 different experiments. Not surprisingly, neither is estimated very well. There is an unknown component of noise that messes up even the estimate of $s$. In fact, in the majority of throws, the maximum likelihood of both $s$ and $C_s$ is close to zero (the uniform prior sets $0.01<C_s<10$). 
\Stwo{ss}{cs}{Left: maximum likelihood value of $s$ vs. true values. Most of the time, even in the case where $C_s=4C_n$, the maximum likelihood estimate is at $s=0$. Right: Maximum likelihood value of $C_s$ for the same 100 draws; again the maximum likelihood value often assumes the data is all noise.}

Fig.~\rf{l2.5} shows the likelihood for a specific value of the data point $d=2.5C_N^{1/2}$. The likelihood peaks at $C_s=4;s=1$.  The degeneracy direction though allows for much larger signal and a non-zero signal as large as 4.5 is shown to be possible (albeit with a much lower probability than at the maximum). When integrating over all possible $s$, one would get \ec{powint}, which in this example would estimate $\hat C_s=5.25C_n$. 
So the maximum likelihood in the 2D space is at a much different point, a lower point for $C_s$, than the marginalized 1D maximum likelihood. Fig.~\rf{l2.5} gives a hint for the main reason for this, the much larger area of the lower likelihood points at larger values of $(s,C_s)$.
\Spng{l2.5}{Likelihood contour in the $s,C_s$ plane for a single data point $d=2.5C_n^{1/2}$. The maximum posterior point in the 2D parameter space is marked and has lower values of $s,C_s$ than the 1D marginalized values.}
%\Spng{like1d}{Marginalized likelihood in 1d for the same exact data point as depicted in Fig.~\rf{l2}.}

The situation is even more extreme when the signal to noise are comparable. When $d=1$, Fig~\rf{l1} shows that the maximum likelihood point is quite far from the 1D maxima.
\Spng{l1}{Same as Fig~\rf{l2.5} except $d=1$.}

\section{Multiple pixels}

Let's generalize in the simplest way possible: by adding another pixel. First, let's fix $C_s=C_n$. In this case, the maximum likelihood solution is at $s_i=d_i/2.$ When $C_s$ is left free, then the 3D maximum likelihood point is at 
\bea
s_i &=& d_i \frac{C_s}{C_s+C_n}\vs
C_s &=& \frac{\sum_i s_i^2}{2}
\eea
Plugging in leads to an equation that determines $C_s$, defining $\bar d^2\equiv \sum_i d_i^2/2$,
\be
C_s =  \left(\frac{C_s}{C_s+C_n} \right)^2\bar d^2.\ee
This reduces to another quadratic equation
\be
C_s^2 + C_s\left[ 2C_n - \bar d^2 \right] + C_n^2=0\ee
with solution
\be
C_s = \frac12 \left[ \bar d^2-2C_n\right] \pm \frac{\bar d}2\sqrt{\bar d^2-4C_n}
.\ee
Fig.~\rf{3d} shows the result in this case. The maximum 3D likelihood is at $s=0$ and $C_s$ as small as possible, but the marginalized likelihood is again at $\bar d^2-C_n$. In the example shown, $d=[1.14,1.38]$, so $\bar d^2=1.6$, so the 1D best fit should be $C_s=0.6$ and it is.

The bottom line is that, in this low signal to noise case, the Weiner filtered map using $C_s$ equal to its true value would yield $s=d/2$ and therefore the estimated power spectrum from the Weiner filtered map would be $\bar s^2=0.25\bar d^2 = 0.4$, much smaller than the true value.

\Spng{3d}{Two pixels drawn from $C_s=C_n=1$; the data here is $1.14,1.38$ leading to a 3D maximum likelihood at very low $s,C_s$ but a 1D max at $C_s=0.6$.}

\end{document}