-
Notifications
You must be signed in to change notification settings - Fork 8
/
supplemental.tex
191 lines (154 loc) · 11.3 KB
/
supplemental.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
\documentclass[10pt,twocolumn,letterpaper]{article}
\usepackage{iccv}
\usepackage{times}
\usepackage{epsfig}
\usepackage{graphicx}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{mathrsfs}
\usepackage{authblk}
\usepackage{placeins}
\usepackage[symbol*]{footmisc}
\DeclareMathOperator{\E}{\mathbb{E}}
% Include other packages here, before hyperref.
% If you comment hyperref and then uncomment it, you should delete
% egpaper.aux before re-running latex. (Or just hit 'q' on the first latex
% run, let it finish, and you should be clear).
\usepackage[pagebackref=true,breaklinks=true,letterpaper=true,colorlinks,bookmarks=false]{hyperref}
\iccvfinalcopy % *** Uncomment this line for the final submission
\def\iccvPaperID{2685} % *** Enter the ICCV Paper ID here
\def\httilde{\mbox{\tt\raisebox{-.5ex}{\symbol{126}}}}
% Pages are numbered in submission mode, and unnumbered in camera-ready
\ificcvfinal\pagestyle{empty}\fi
\begin{document}
%%%%%%%%% TITLE
%\title{Realistic Video Face Retargeting Using Conditional Generative Adversarial Networks}
% \title{Animating Realistic Dynamic Facial Textures from a Single Image using GANs}
\title{Realistic Dynamic Facial Textures from a Single Image using GANs - Supplementary Material}
\author[1,3,4]{Kyle Olszewski\thanks{[email protected] (equal contribution)}}
\author[1]{Zimo Li\thanks{[email protected] (equal contribution)}}
\author[1]{Chao Yang \thanks{[email protected] (equal contribution)}}
\author[1]{Yi Zhou\thanks{[email protected]}}
\author[1,3]{Ronald Yu\thanks{[email protected]}}
\author[1]{Zeng Huang\thanks{[email protected]}}
\author[1]{Sitao Xiang\thanks{[email protected]}}
\author[1,3]{Shunsuke Saito\thanks{[email protected]}}
\author[2]{Pushmeet Kohli\thanks{[email protected], project conducted while at MSR}}
\author[1,3,4]{Hao Li\thanks{[email protected]}}
\affil[1]{University of Southern California}
\affil[2]{DeepMind}
\affil[3]{Pinscreen}
\affil[4]{USC Institute for Creative Technologies}
%\title{Animating Realistic Dynamic Facial Textures using Generative Adversarial Networks}
%% \author{First Author\\
%% Institution1\\
%% Institution1 address\\
%% {\tt\small [email protected]}
%% % For a paper whose authors are all at the same institution,
%% % omit the following lines up until the closing ``}''.
%% % Additional authors and addresses can be added with ``\and'',
%% % just like the second author.
%% % To save space, use either the email address or home page, not both
%% \and
%% Second Author\\
%% Institution2\\
%% First line of institution2 address\\
%% {\tt\small [email protected]}
%% }
\maketitle
\thispagestyle{empty}
\section{Additional Results and Evaluations}
% The video
More results and evaluations on video sequences of various test subjects can be seen in the supplemental video included with this submission. We show examples of the facial albedo textures that are inferred given the target identity and source expression sequence (with the estimated environmental lighting factored out to allow for relighting the textured faces under different illumination conditions).
The retargeting and compositing results seen in the video demonstrate that, in addition to synthesizing the mouth interior, our system is able to generate subtle wrinkles and deformations in the face texture that are too small to be represented in the mesh used for fitting the face model to the subject, but do indeed enhance the realism of the synthesized sequence of expressions generated for the target image. (We show these sequences slowed down to allow for better visualization of the transient details created for each expression.) Furthermore, we note that the wrinkles generated by this system do not correspond directly to those of the source expressions in the video, but rather vary depending on factors such as the appearance and age of the person depicted in the target image (see, for example, the retargeting to images of Brad Pitt and Angelina Jolie in the section ``Retargeting Results and Comparison with Static Texture'' in the supplemental video).
Our approach to compositing the final rendered image of the target subject into the source sequence requires that the faces be front-facing. However, we note that our network architecture can synthesize dynamic textures even for non-frontal viewpoints of the source subject, as seen in Fig.~\ref{fig:replaceres}. Cases showing a frontal source subject animating a non-frontal target subject can be seen in Fig.~\ref{fig:replaceres2}, although we note that in this case there are artifacts in the occluded regions that may be visible in the final animation. Additional retargeting and compositing results can be found in Fig.~\ref{fig:compositenew}.
%% \paragraph{}
\section{Implementation, Training and Performance Details}
Our networks are implemented and trained using the Torch7 framework, using an NVIDIA Titan GPU to accelerate the training and inference. Fig.~\ref{fig:error} shows the loss on the training and validation dataset for both the generative and discriminative networks.
Below we list the average per-frame execution time for each stage in our texture generation and compositing pipeline.
While our implementation of the 3D face model fitting approach described in Section 4 is implemented on the CPU and does not run in realtime, we note that [Thies et al. 2016] demonstrate that such an approach can be implemented in parallel on the GPU to achieve realtime performance. Furthermore, the mouth interior synthesis approach described in Section 5.4 of the paper is implemented in Matlab using a single thread for processing, and thus could be further optimized using parallel processing. Thus, while the approach used for replacing the faces in the source video sequence with the rendered target faces is not designed to run in realtime, it should be possible to further optimize the other stages of the pipeline to run at interactive framerates.
% While the approach used for compositing the .
% albedo, relight
% wrinkles not rep. in mesh expressions
% upsampled
% realtime f2f, cpu
% parallelized, optimized
% \vfill\eject
\begin{enumerate}
\item 3D face model fitting (Section 4): 5.6 seconds
\item Texture inference (Section 5): 12 milliseconds.
\item Mouth interior synthesis (Section 5.4): 156 milliseconds.
\item Video face replacement (Section 6): 4.5 seconds.
\end{enumerate}
\section{Acknowledgements}
First, we would like to thank all participants who were videotaped. This research is supported in part by Adobe, Oculus \& Facebook, Huawei, Sony, the Google Faculty Research Award, the Okawa Foundation Research Grant, and the U.S. Army Research Laboratory (ARL) under contract W911NF-14-D-0005. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of ARL or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purpose notwithstanding any copyright annotation thereon.
\clearpage
%% \paragraph{}
%% \section{}
%% \paragraph{}
%% \input{implementation_details}
%% \input{additional_results}
%% \input{evaluations}
% TODO: reference to the table. The caption should be fleshed out.
\begin{figure}[th]
\begin{center}
\includegraphics[width=0.32\columnwidth]{figures/kylehao_transfer/raw000001.png} \\
\includegraphics[width=0.32\columnwidth]{figures/kylehao_transfer/raw000762.png}
\includegraphics[width=0.32\columnwidth]{figures/kylehao_transfer/static_000762.png}
\includegraphics[width=0.32\columnwidth]{figures/kylehao_transfer/dynamic_000762.png} \\
\includegraphics[width=0.32\columnwidth]{figures/kylehao_transfer/raw000837.png}
\includegraphics[width=0.32\columnwidth]{figures/kylehao_transfer/static_000837.png}
\includegraphics[width=0.32\columnwidth]{figures/kylehao_transfer/dynamic_000837.png} \\
\includegraphics[width=0.32\columnwidth]{figures/kylehao_transfer/raw000455.png}
\includegraphics[width=0.32\columnwidth]{figures/kylehao_transfer/static_000455.png}
\includegraphics[width=0.32\columnwidth]{figures/kylehao_transfer/dynamic_000455.png} \\
\includegraphics[width=0.32\columnwidth]{figures/kylehao_transfer/raw000066.png}
\includegraphics[width=0.32\columnwidth]{figures/kylehao_transfer/static_000066.png}
\includegraphics[width=0.32\columnwidth]{figures/kylehao_transfer/dynamic_000066.png}
\end{center}
\caption{Non-frontal face reenactment. The top row displays the target image. In the remaining rows, from left to right: the source image, the rendered static texture and the rendered dynamic texture. We can see that the dynamic texture contains more subtle details such as wrinkles, resulting in a more expressive and plausible image of the target subject. Also note that the synthesized mouth interior results in much more plausible renderings when the mouth is open.}
\vspace{-0.05in}
\label{fig:replaceres}
\end{figure}
\begin{figure}[th]
\begin{center}
\includegraphics[width=0.32\columnwidth]{figures/haokyle_transfer/raw000001.png} \\
\includegraphics[width=0.32\columnwidth]{figures/haokyle_transfer/raw000165.png}
%% \includegraphics[width=0.32\columnwidth]{figures/haokyle_transfer/static_000165.png}
\includegraphics[width=0.32\columnwidth]{figures/haokyle_transfer/dynamic_000165.png} \\
\includegraphics[width=0.32\columnwidth]{figures/haokyle_transfer/raw000103.png}
%% \includegraphics[width=0.32\columnwidth]{figures/haokyle_transfer/static_000103.png}
\includegraphics[width=0.32\columnwidth]{figures/haokyle_transfer/dynamic_000103.png} \\
\includegraphics[width=0.32\columnwidth]{figures/haokyle_transfer/raw000356.png}
%% \includegraphics[width=0.32\columnwidth]{figures/haokyle_transfer/static_000356.png}
\includegraphics[width=0.32\columnwidth]{figures/haokyle_transfer/dynamic_000356.png} \\
\includegraphics[width=0.32\columnwidth]{figures/haokyle_transfer/raw000681.png}
%% \includegraphics[width=0.32\columnwidth]{figures/haokyle_transfer/static_000681.png}
\includegraphics[width=0.32\columnwidth]{figures/haokyle_transfer/dynamic_000681.png}
\end{center}
\caption{Failure cases induced by an extreme non-frontal target image. The top row displays the target image. The remaining rows display the source expression image (left) and the rendered image with the synthesized dynamic texture (right). While we can synthesize details for the visible region of the target image, the occluded regions contain artifacts that are visible when the image is rendered with these regions visible to the camera.}
\vspace{-0.05in}
\label{fig:replaceres2}
\end{figure}
\begin{figure*}[h!]
\setlength\tabcolsep{1.5pt}
\centering
\begin{tabular}{cc}
\includegraphics[width=.50\textwidth]{figures/loss/trainErrorG.png}&
\includegraphics[width=.50\textwidth]{figures/loss/trainErrorD.png} \\
\includegraphics[width=.50\textwidth]{figures/loss/validErrorG.png}&
\includegraphics[width=.50\textwidth]{figures/loss/validErrorD.png} \\
Generator Error & Discriminator Error \\
\end{tabular}
\caption{Generative and discriminative training loss (top row) and validation loss (bottom row).}\label{fig:error}
\vspace{-0.05in}
\end{figure*}
\begin{figure*}[th]
\begin{center}
\includegraphics[width=1.00\textwidth]{figures/supplementalresults.pdf} \\
\end{center}
\caption{Additional retargeting and compositing results.}
\vspace{-0.05in}
\label{fig:compositenew}
\end{figure*}
\end{document}