-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
2617 lines (2472 loc) · 172 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=iso-8859-1">
<title>Vangelis Kalogerakis' home page</title>
<style>
#content {
MARGIN-LEFT: auto;
MARGIN-RIGHT: auto;
max-width: 1700px;
width:expression(document.body.clientWidth > 1700? "1700": "auto" );
}
.ptd {
border-right-style: none;
border-left-style: none;
border-top-style: dotted;
border-bottom-style: none;
border-top-width: thin;
border-right-width: thin;
border-bottom-width: thin;
border-left-width: thin;
border-top-color: #999;
border-right-color: #999;
border-bottom-color: #999;
border-left-color: #999;
}
a {
FONT-WEIGHT: bolder;
TEXT-DECORATION: none;
}
a:hover {
background-color: #FFCCFF;
}
a:link {
COLOR: #0033FF;
text-decoration: none;
}
a:visited {
COLOR: #0033FF;
text-decoration: none;
font-size: medium;
text-align: right;
}
body {
background-color: #F0FFFF;
margin: 0px;
padding: 0px;
text-align: justify;
font-family: Helvetica, "Trebuchet MS", Calibri, sans-serif;
color: rgb(0, 0, 0);
}
.ptable {
border-color: #999;
border-bottom-color: #999;
border-left-color: #999;
border-right-color: #999;
border-top-color: #999;
border: ridge;
}
.styleAbstract {
font-style: italic;
font-size: small;
text-align: justify;
line-height: 1.1;
}
.copyrights {
font-size: x-small;
color: #999;
font-weight: bold;
}
.smalllinks {
font-size: small;
}
.importantnote {
color: #D70404;
}
#footer {
CLEAR: both;
FONT-SIZE: small;
PADDING-BOTTOM: 10px;
PADDING-TOP: 20px;
text-align: right
}
IMG {
BORDER-TOP-STYLE: none;
BORDER-RIGHT-STYLE: none;
BORDER-LEFT-STYLE: none;
BORDER-BOTTOM-STYLE: none;
}
h1.title {
margin: 0px 10px 30px;
padding: 14px;
padding-top: 18px;
font-size: 100%;
line-height: 28px;
}
h1 {
margin: 20px 0px 0px 0px;
font-size: 175%;
line-height: 28px;
font-weight: 900;
color: rgb(153, 0, 0);
}
h2 {
margin: 10px 0px 0px 0px;
font-weight: normal;
font-size: 150%;
line-height: normal;
color: rgb(153, 0, 0);
}
h4 {
margin: 5px 0px 0px 0px;
font-weight: normal;
font-size: 100%;
line-height: normal;
color: rgb(153, 0, 0);
}
.student-color {
font-weight: normal;
font-size: 100%;
color: rgb(165, 42, 42);
}
P {
MARGIN-TOP: 0px
}
UL {
MARGIN-TOP: 1px;
MARGIN-BOTTOM: 12px
}
DIV.abstract LI {
MARGIN: 1px
}
#people {
PADDING-LEFT: 10px;
MARGIN-LEFT: 0px;
LIST-STYLE-TYPE: none
}
</style>
</head>
<body>
<center>
<div id="content" align="left">
<table>
<tbody>
<tr align="left">
<td width="260" align="left"><img src="face.jpg" alt="picture" class="importantnote"
width="260"></td>
<td width="10" align="left"> </td>
<td width="100%" valign="middle" align="left">
<p align="left"><strong>Evangelos Kalogerakis<br>
Associate Professor</strong><br>
<a href="https://www.ece.tuc.gr/en/home">ECE</a>, <a href="https://www.tuc.gr/en/home">Technical University of Crete</a><br>
<strong> Also research affiliate at </strong><a href="https://cyens.org.cy/">CYENS</a><br>
<br>
<strong> Email:</strong>[first four letters of my last name] [DOT] ai [DOT] lab [AT] gmail [DOT] com<br>
</p>
</td>
</tr>
<tr>
<td colspan="4" align="left">
<div align="justify">
<p><br>
<strong> Quick links:</strong> <a href="#publications">Publications</a>, <a href="#courses">Course information</a>,<a href="#talks">
Talks</a>,<a href="#students"> Students</a>, <a href="#services">Academic
Service</a>, <a href="https://scholar.google.com/citations?user=8TwcVQcAAAAJ">Google
Scholar</a>, <a href="https://www.youtube.com/user/vkalogerakis">YouTube</a></p>
<p><strong>Research Interests and short bio:</strong> Evangelos Kalogerakis' research deals with the development
of graphics+vision algorithms and techniques, empowered by
Machine Learning and Artificial Intelligence, to help people to easily create and process
representations of the 3D visual world. He is particularly
interested in algorithms that generate 3D models of objects,
scenes, animations, and intelligently process 3D scans,
geometric data, collections of shapes, images, and video.
His research has been supported by the European Research Council (<a href="https://www.ece.tuc.gr/en/news/item/new-pan-european-distinction-of-the-ece-school-of-the-technical-university-of-crete-for-the-second-year-in-a-row">ERC consolidator grant</a>)
and grants from the National Science Foundation (NSF).
He is currently an Associate Professor at the School of Electrical and Computer Engineering at the Technical University of Crete, where, starting in 2025, he leads a research group focused on graphics and vision.
Previously, he was a tenured Associate Professor at the College of Information and Computer Sciences at the University of Massachusetts Amherst, which he initially joined as an Assistant Professor in 2012. Before that, he was a postdoctoral researcher at Stanford University from 2010 to 2012. He earned his PhD from the University of Toronto in 2010. His <a href="degrees/KALOGERAKIS_PhD_THESIS.pdf">PhD thesis </a> introduced machine learning techniques for geometry processing.
He has served as Area Chair in CVPR, ICCV, ECCV, NIPS and on
technical paper committees for SIGGRAPH,
SIGGRAPH ASIA, Eurographics, and the Symposium on Geometry
Processing. He has also served as an Associate Editor in the
Editorial Boards of IEEE Transactions on Pattern Analysis
and Machine Intelligence (TPAMI) and IEEE Transactions on
Visualization & Computer Graphics (TVCG).
He co-chaired <a href="https://eg2024.cyens.org.cy/">Eurographics 2024</a>.
He was listed as one of the <a href="https://www.aminer.cn/ai2000/2021/cg">100 most cited computer graphics scholars</a> in the world between 2010-2020 by the Tsinghua's
AMiner academic network.</p>
<p>
<strong>
<span class="importantnote">
I have multiple openings for MS/PhD/postdoc/intern positions funded by my ERC grant. If you are interested in applying to work with me, please first read <a href="erc.html">here</a> before reaching out to me! Note that I am on absence of leave from UMass Amherst -- I am not hiring new students at UMass Amherst.</span></strong></p>
</div>
</td>
</tr>
</tbody>
</table>
<table class="ptable" cellspacing="0" cellpadding="5" border="3" align="center">
<tbody>
<tr valign="middle">
<td class="ptd" colspan="2" valign="middle">
<h2><a name="publications">Selected Publications</a></h2>
For a complete list, see <a href="https://scholar.google.com/citations?user=8TwcVQcAAAAJ">Google
Scholar</a>. My students' names appear with <span class="student-color">brown font.</span>
</td>
</tr>
<tr valign="middle">
<td class="ptd" width="270" valign="middle"><a href="https://lodurality.github.io/GEM3D/"><img src="papers/sig24.png" alt="Image" width="300"></a></td>
<td class="ptd" height="225" valign="middle" align="right">
<div align="justify">
<p> <b>GEM3D: GEnerative Medial Abstractions for 3D Shape Synthesis</b><br>
<em>[<a href="https://arxiv.org/pdf/2402.16994.pdf">PAPER</a>][<a href="https://lodurality.github.io/GEM3D/">PAGE WITH CODE & DATA</a>] </em><br>
<i><span class="student-color">Dmitry Petrov, Pradyumn Goyal, Vikas Thamizharasan</span>, Vladimir G. Kim, Matheus Gadelha, Melinos Averkiou, Siddhartha Chaudhuri, Evangelos Kalogerakis</i>
<br>
<i>Proceedings of ACM SIGGRAPH 2024</i><br>
<span class="styleAbstract"><br>
<em>Abstract: </em>We introduce GEM3D -- a new deep, topology-aware generative model of 3D shapes. The key ingredient of our method is a neural skeleton-based representation encoding information on both shape topology and geometry. Through a denoising diffusion probabilistic model, our method first generates skeleton-based representations following the Medial Axis Transform (MAT), then generates surfaces through a skeleton-driven neural implicit formulation. The neural implicit takes into account the topological and geometric information stored in the generated skeleton representations to yield surfaces that are more topologically and geometrically accurate compared to previous neural field formulations. We discuss applications of our method in shape synthesis and point cloud reconstruction tasks, and evaluate our method both qualitatively and quantitatively. We demonstrate significantly more faithful surface reconstruction and diverse shape generation results compared to the state-of-the-art, also involving challenging scenarios of reconstructing and synthesizing structurally complex, high-genus shape surfaces from Thingi10K and ShapeNet. </span> </p>
</div>
</td>
</tr>
<tr valign="middle">
<td class="ptd" width="270" valign="middle"><a href="https://vikastmz.github.io/VecFusion/"><img src="papers/cvpr24vecfusion.png" alt="Image" width="300"></a></td>
<td class="ptd" height="225" valign="middle" align="right">
<div align="justify">
<p> <b>VecFusion: Vector Font Generation with Diffusion</b><br>
<em>[<a href="https://arxiv.org/pdf/2312.10540.pdf">PAPER</a>][<a href="https://vikastmz.github.io/VecFusion/">PAGE WITH CODE & DATA</a>] </em><br>
<i><span class="student-color">Vikas Thamizharasan*</span>, Difan Liu*, Shantanu Agarwal, Matthew Fisher, Michael Gharbi, Oliver Wang, Alec Jacobson, Evangelos Kalogerakis</i><br>
(* indicates equal contribution) <br>
<i>Proceedings of the Computer Vision and Pattern Recognition (CVPR) 2024<span class="importantnote"> (Selected as highlight)</span></i><br>
<span class="styleAbstract"><br>
<em>Abstract: </em>We present VecFusion, a new neural architecture that can generate vector fonts with varying topological structures and precise control point positions. Our approach is a cascaded diffusion model which consists of a raster diffusion model followed by a vector diffusion model. The raster model generates low-resolution, rasterized fonts with auxiliary control point information, capturing the global style and shape of the font, while the vector model synthesizes vector fonts conditioned on the low-resolution raster fonts from the first stage. To synthesize long and complex curves, our vector diffusion model uses a transformer architecture and a novel vector representation that enables the modeling of diverse vector geometry and the precise prediction of control points. Our experiments show that, in contrast to previous generative models for vector graphics, our new cascaded vector diffusion model generates higher quality vector fonts, with complex structures and diverse styles. </span> </p>
</div>
</td>
</tr>
<tr valign="middle">
<td class="ptd" width="270" valign="middle"><a href="https://vikastmz.github.io/NIVeL/"><img src="papers/cvpr24nivel.png" alt="Image" width="300"></a></td>
<td class="ptd" height="225" valign="middle" align="right">
<div align="justify">
<p> <b>NIVeL: Neural Implicit Vector Layers for Text-to-Vector Generation</b><br>
<em>[<a href="https://arxiv.org/pdf/2405.15217">PAPER</a>][<a href="https://vikastmz.github.io/NIVeL/">PAGE WITH CODE & DATA</a>] </em><br>
<i><span class="student-color">Vikas Thamizharasan</span>, Difan Liu, Matthew Fisher, Nanxuan Zhao, Evangelos Kalogerakis, Michal Lukáč</i>
<br>
<i>Proceedings of the Computer Vision and Pattern Recognition (CVPR) 2024</i><br>
<span class="styleAbstract"><br>
<em>Abstract: </em>The success of denoising diffusion models in representing rich data distributions over 2D raster images has prompted research on extending them to other data representations, such as vector graphics. Unfortunately due to their variable structure and scarcity of vector training data, directly applying diffusion models on this domain remains a challenging problem. Using workarounds like optimization via Score Distillation Sampling (SDS) is also fraught with difficulty, as vector representations are non-trivial to directly optimize and tend to result in implausible geometries such as redundant or self-intersecting shapes. NIVeL addresses these challenges by reinterpreting the problem on an alternative, intermediate domain which preserves the desirable properties of vector graphics -- mainly sparsity of representation and resolution-independence. This alternative domain is based on neural implicit fields expressed in a set of decomposable, editable layers. by construction and allow for changes in topology while capturing the visual features of the modelled output. Based on our experiments, NIVeL produces text-to-vector graphics results of significantly better quality than the state-of-the-art.</span> </p>
</div>
</td>
</tr>
<tr valign="middle">
<td class="ptd" width="270" valign="middle"><a href="https://open3dis.github.io/"><img src="papers/cvpr24open.png" alt="Image" width="300"></a> </td>
<td class="ptd" height="225" valign="middle" align="right">
<div align="justify">
<p> <b>Open3DIS: Open-vocabulary 3D Instance Segmentation with 2D Mask Guidance</b><br>
<em>[<a href="https://arxiv.org/pdf/2312.10671.pdf">PAPER</a>] [<a href="https://open3dis.github.io/">PAGE WITH CODE & DATA</a>] </em><br>
<i>Phuc Nguyen*, <span class="student-color">Tuan Duc Ngo*</span>, Evangelos Kalogerakis, Chuang Gan, Anh Tran, Cuong Pham, Khoi Nguyen<br>
(* indicates equal contribution) <br>
</i>
<i>Proceedings of the Computer Vision and Pattern Recognition (CVPR) 2024</i><br>
<span class="styleAbstract"><br>
<em>Abstract: </em>We introduce Open3DIS, a novel solution designed to tackle the problem of Open-Vocabulary Instance Segmentation within 3D scenes. Objects within 3D environments exhibit diverse shapes, scales, and colors, making precise instance-level identification a challenging task. Recent advancements in Open-Vocabulary scene understanding have made significant strides in this area by employing class-agnostic 3D instance proposal networks for object localization and learning queryable features for each 3D mask. While these methods produce high-quality instance proposals, they struggle with identifying small-scale and geometrically ambiguous objects. The key idea of our method is a new module that aggregates 2D instance masks across frames and maps them to geometrically coherent point cloud regions as high-quality object proposals addressing the above limitations. These are then combined with 3D class-agnostic instance proposals to include a wide range of objects in the real world. To validate our approach, we conducted experiments on three prominent datasets, including ScanNet200, S3DIS, and Replica, demonstrating significant performance gains in segmenting objects with diverse categories over the state-of-the-art approaches. </span> </p>
</div>
</td>
</tr>
<tr valign="middle">
<td class="ptd" width="270" valign="middle"><a href="https://lodurality.github.io/ANISE/"><img src="papers/tvcg23.png" alt="Image" width="300"></a> </td>
<td class="ptd" height="225" valign="middle" align="right">
<div align="justify">
<p> <b> ANISE: Assembly-based Neural Implicit Surface rEconstruction</b><br>
<em>[<a href="https://arxiv.org/pdf/2205.13682">PAPER</a>] [<a href="https://lodurality.github.io/ANISE/">PAGE WITH CODE & DATA</a>] </em><br>
<i><span class="student-color">Dmitry Petrov</span>, Matheus Gadelha, Radomir Mech, Evangelos Kalogerakis </i>
<br>
<i>IEEE Transactions on Visualization and Computer Graphics, 2023</i><br>
<i>(also presented at SGP 2023)</i> <br>
<span class="styleAbstract"><br>
<em>Abstract: </em>
We present ANISE, a method that reconstructs a 3D shape from partial observations (images or sparse point clouds) using a part-aware neural implicit shape representation. The shape is formulated as an assembly of neural implicit functions, each representing a different part instance. In contrast to previous approaches, the prediction of this representation proceeds in a coarse-to-fine manner. Our model first reconstructs a structural arrangement of the shape in the form of geometric transformations of its part instances. Conditioned on them, the model predicts part latent codes encoding their surface geometry. Reconstructions can be obtained in two ways: (i) by directly decoding the part latent codes to part implicit functions, then combining them into the final shape; or (ii) by using part latents to retrieve similar part instances in a part database and assembling them in a single shape. We demonstrate that, when performing reconstruction by decoding part representations into implicit functions, our method achieves state-of-the-art part-aware reconstruction results from both images and sparse point clouds. When reconstructing shapes by assembling parts retrieved from a dataset, our approach significantly outperforms traditional shape retrieval methods even when significantly restricting the database size. We present our results in well-known sparse point cloud reconstruction and single-view reconstruction benchmarks. </span> </p>
</div>
</td>
</tr>
<tr valign="middle">
<td class="ptd" width="270" valign="middle"><a href="https://marios2019.github.io/CSN/"><img src="papers/sgp23.png" alt="Image" width="300"></a> </td>
<td class="ptd" height="225" valign="middle" align="right">
<div align="justify">
<p> <b>Cross-Shape Attention for Part Segmentation of 3D Point Clouds</b><br>
<em>[<a href="https://arxiv.org/pdf/2003.09053.pdf">PAPER</a>] [<a href="https://marios2019.github.io/CSN/">PAGE WITH CODE & DATA</a>] </em><br>
<i><span class="student-color">Marios Loizou*</span>, Siddhant Garg*, <span class="student-color">Dmitry Petrov*</span>, Melinos Averkiou, Evangelos Kalogerakis </i>
<br>
(* indicates equal contribution)<br>
<i>Computer Graphics Forum, vol. 42, no. 5</i><br>
<i>(also in the Proceedings of SGP 2023)</i><br>
<a href="https://paperswithcode.com/sota/3d-semantic-segmentation-on-partnet?p=cross-shape-graph-convolutional-networks"><img src="papers/paperswithcode-CSN.png" height="25" alt="Papers with Code -- Leaderboard"/></a><br>
<span class="styleAbstract">
<em>Abstract: </em>
We present a deep learning method that propagates point-wise feature representations across shapes within a collection for the purpose of 3D shape segmentation. We propose a cross-shape attention mechanism to enable interactions between a shape's point-wise features and those of other shapes. The mechanism assesses both the degree of interaction between points and also mediates feature propagation across shapes, improving the accuracy and consistency of the resulting point-wise feature representations for shape segmentation. Our method also proposes a shape retrieval measure to select suitable shapes for cross-shape attention operations for each test shape. Our experiments demonstrate that our approach yields state-of-the-art results in the popular PartNet dataset. </span> </p>
</div>
</td>
</tr>
<tr valign="middle">
<td class="ptd" width="270" valign="middle"><a href=""><img src="papers/miccai23.png" alt="Image" width="300"></a> </td>
<td class="ptd" height="225" valign="middle" align="right">
<div align="justify">
<p> <b>Machine Learning for Automated Mitral Regurgitation Detection from Cardiac Imaging</b><br>
<em>[<a href="https://arxiv.org/pdf/2310.04871.pdf">PAPER</a>] </em><br>
<i>Ke Xiao, Erik Learned-Miller, Evangelos Kalogerakis, James Priest, Madalina Fiterau</i>
<br>
<i>Proceedings of Medical Image Computing and Computer-Assisted Intervention - MICCAI 2023</i><br>
<span class="styleAbstract"><br>
<em>Abstract: </em>
Mitral regurgitation (MR) is a heart valve disease with potentially
fatal consequences that can only be forestalled through timely
diagnosis and treatment. Traditional diagnosis methods are expensive,
labor-intensive and require clinical expertise, posing a barrier to screening
for MR. To overcome this impediment, we propose a new semisupervised
model for MR classification called CUSSP. CUSSP operates
on cardiac magnetic resonance (CMR) imaging slices of the 4-chamber
view of the heart. It uses standard computer vision techniques and contrastive
models to learn from large amounts of unlabeled data, in conjunction
with specialized classifiers to establish the first ever automated
MR classification system using CMR imaging sequences. Evaluated on a
test set of 179 labeled sequences, CUSSP
attains an F1 score of 0.69 and a ROC-AUC score of 0.88, setting the
first benchmark result for detecting MR from CMR imaging sequences. </span> </p>
</div>
</td>
</tr>
<tr valign="middle">
<td class="ptd" width="270" valign="middle"><a href="https://zhan-xu.github.io/motion-rig/"><img src="papers/siga22.jpg" alt="Image" width="300"></a> </td>
<td class="ptd" height="225" valign="middle" align="right">
<div align="justify">
<p> <b> MoRig: Motion-Aware Rigging of Character Meshes from Point Clouds</b><br>
<em>[<a href="https://arxiv.org/pdf/2210.09463.pdf">PAPER</a>] [<a href="https://youtu.be/sPxfnQ8j07Y">VIDEO</a>] [<a href="https://zhan-xu.github.io/motion-rig/">PAGE WITH CODE & DATA</a>] </em><br>
<i><span class="student-color">Zhan Xu</span>, <span class="student-color">Yang Zhou</span>, Li Yi, Evangelos Kalogerakis </i>
<br>
<i>Proceedings of ACM SIGGRAPH ASIA 2022</i><i><span>
</span></i> <br>
<span class="styleAbstract"><br>
<em>Abstract: </em>
We present MoRig, a method that automatically rigs character meshes driven by single-view point cloud streams capturing the motion of performing characters. Our method is also able to animate the 3D meshes according to the captured point cloud motion. At the heart of our approach lies a deep neural network that encodes motion cues from the point clouds into features that are informative about the articulated parts of the performing character. These features guide the inference of an appropriate skeletal rig for the input mesh, which is then animated based on the input point cloud motion. Our method can rig and animate diverse characters, including humanoids, quadrupeds, and toys with varying articulations. It is designed to account for occluded regions in the input point cloud sequences and any mismatches in the part proportions between the input mesh and captured character. Compared to other rigging approaches that ignore motion cues, our method produces more accurate skeletal rigs, which are also more appropriate for re-targeting motion from captured characters. </span> </p>
</div>
</td>
</tr>
<tr valign="middle">
<td class="ptd" width="270" valign="middle"><a href="https://people.cs.umass.edu/~dliu/projects/ASSET/"><img src="papers/sig22.jpg" alt="Image" width="300"></a> </td>
<td class="ptd" height="225" valign="middle" align="right">
<div align="justify">
<p> <strong> ASSET: Autoregressive Semantic Scene Editing with Transformers at High Resolutions</strong><br>
<em>[<a href="papers/more/ASSET.pdf">PAPER</a>]
[<a href="https://people.cs.umass.edu/~dliu/projects/ASSET/">PAGE WITH CODE & DATA</a>]</em><br>
<i><span class="student-color">Difan Liu</span>, Sandesh Shetty, Tobias Hinz, Matthew Fisher, Richard Zhang, Taesung Park, Evangelos Kalogerakis </i>
<br>
<i>ACM Transactions on Graphics, Vol. 41, No. 4, 2022
</i><br>
<i>(also in the Proceedings of ACM SIGGRAPH 2022)</i><i><span>
</span></i> <br>
<span class="styleAbstract"><br>
<em>Abstract: </em>
We present ASSET, a neural architecture for automatically modifying an input high-resolution image according to a user's edits on its semantic segmentation map. Our architecture is based on a transformer with a novel attention mechanism. Our key idea is to sparsify the transformer's attention matrix at high resolutions, guided by dense attention extracted at lower image resolution. While previous attention mechanisms are computationally too expensive for handling high-resolution images or are overly constrained within specific image regions hampering long-range interactions, our proposed attention mechanism is both computationally efficient and effective. Our sparsified attention mechanism is able to capture long-range interactions and context, leading to synthesizing interesting phenomena in scenes, such as reflections of landscapes onto water or flora consistent with the rest of the landscape, that were not possible to generate reliably with previous convnets and transformer approaches. We present qualitative and quantitative results, along with user studies, demonstrating the effectiveness of our method. </span> </p>
</div>
</td>
</tr>
<tr valign="middle">
<td class="ptd" width="270" valign="middle"><a href="https://nv-tlabs.github.io/MvDeCor/"><img src="papers/eccv22.jpg" alt="Image" width="300"></a> </td>
<td class="ptd" height="225" valign="middle" align="right">
<div align="justify">
<p> <strong> MvDeCor: Multi-view Dense Correspondence Learning for Fine-Grained 3D Segmentation</strong><br>
<em>[<a href="papers/more/MvDeCor.pdf">PAPER</a>]
[<a href="https://nv-tlabs.github.io/MvDeCor/">PAGE WITH CODE & DATA</a>]</em><br>
<i><span class="student-color">Gopal Sharma</span>, Kangxue Yin, Subhransu Maji, Evangelos Kalogerakis, Or Litany, Sanja Fidler</i>
<br>
<i>Proceedings of the European Conference on Computer Vision (ECCV) 2022</i><i><span>
</span></i> <br>
<span class="styleAbstract"><br>
<em>Abstract: </em>
We propose to utilize self-supervised techniques in the 2D
domain for fine-grained 3D shape segmentation tasks. This is inspired by
the observation that view-based surface representations are more effective
at modeling high-resolution surface details and texture than their 3D
counterparts based on point clouds or voxel occupancy. Specifically, given
a 3D shape, we render it from multiple views, and set up a dense correspondence
learning task within the contrastive learning framework. As a
result, the learned 2D representations are view-invariant and geometrically
consistent, leading to better generalization when trained on a limited
number of labeled shapes than alternatives based on self-supervision
in 2D or 3D alone. Experiments on textured (RenderPeople) and untextured
(PartNet) 3D datasets show that our method outperforms state-ofthe-
art alternatives in fine-grained part segmentation. The improvements
over baselines are greater when only a sparse set of views is available for
training or when shapes are textured, indicating that MvDeCor benefits
from both 2D processing and 3D geometric reasoning. </span> </p>
</div>
</td>
</tr>
<tr valign="middle">
<td class="ptd" width="270" valign="middle"><a href="https://yzhou359.github.io/video_reenact/"><img src="papers/cvpr22vmg.png" alt="Image" width="300"></a> </td>
<td class="ptd" height="225" valign="middle" align="right">
<div align="justify">
<p> <strong> Audio-driven Neural Gesture Reenactment with Video Motion Graphs</strong><br>
<em>[<a href="papers/more/VideoMotionGraphs.pdf">PAPER</a>]
[<a href="https://yzhou359.github.io/video_reenact/">PAGE WITH CODE & DATA</a>]</em><br>
<i><span class="student-color">Yang Zhou</span>, Jimei Yang, Dingzeyu Li, Jun Saito, Deepali Aneja, Evangelos Kalogerakis </i>
<br>
<i>Proceedings of the Computer Vision and Pattern Recognition (CVPR) 2022</i><i><span>
</span></i> <br>
<span class="styleAbstract"><br>
<em>Abstract: </em>
Human speech is often accompanied by body gestures including arm and hand gestures. We present a method that reenacts a high-quality video with gestures matching a target speech audio. The key idea of our method is to split and re-assemble clips from a reference video through a novel video motion graph encoding valid transitions between clips. To seamlessly connect different clips in the reenactment, we propose a pose-aware video blending network which synthesizes video frames around the stitched frames between two clips. Moreover, we developed an audio-based gesture searching algorithm to find the optimal order of the reenacted frames. Our system generates reenactments that are consistent with both the audio rhythms and the speech content. We evaluate our synthesized video quality quantitatively, qualitatively, and with user studies, demonstrating that our method produces videos of much higher quality and consistency with the target audio compared to previous work and baselines.
</span> </p>
</div>
</td>
</tr>
<tr valign="middle">
<td class="ptd" width="270" valign="middle"><a href="https://zhan-xu.github.io/parts/"><img src="papers/cvpr22apes.png" alt="Image" width="300"></a> </td>
<td class="ptd" height="225" valign="middle" align="right">
<div align="justify">
<p> <strong> APES: Articulated Part Extraction from Sprite Sheets</strong><br>
<em>[<a href="papers/more/APES.pdf">PAPER</a>]
[<a href="https://zhan-xu.github.io/parts/">PAGE WITH CODE & DATA</a>]</em><br>
<i><span class="student-color">Zhan Xu</span>, Matthew Fisher, <span class="student-color">Yang Zhou</span>, Deepali Aneja, Rushikesh Dudhat, Li Yi, Evangelos Kalogerakis</i>
<br>
<i>Proceedings of the Computer Vision and Pattern Recognition (CVPR) 2022</i><i><span>
</span></i> <br>
<span class="styleAbstract"><br>
<em>Abstract: </em>
Rigged puppets are one of the most prevalent representations to create 2D character animations. Creating these puppets requires partitioning characters into independently moving parts. In this work, we present a method to automatically identify such articulated parts from a small set of character poses shown in a sprite sheet, which is an illustration of the character that artists often draw before puppet creation. Our method is trained to infer articulated body parts, e.g. head, torso and limbs, that can be re-assembled to best reconstruct the given poses. Our results demonstrate significantly better performance than alternatives qualitatively and quantitatively.</span> </p>
</div>
</td>
</tr>
<tr valign="middle">
<td class="ptd" width="270" valign="middle"><a href="https://hippogriff.github.io/prifit/"><img src="papers/sgp22.jpg" alt="Image" width="300"></a> </td>
<td class="ptd" height="225" valign="middle" align="right">
<div align="justify">
<p> <strong> PriFit: Learning to Fit Primitives Improves Few Shot Point Cloud Segmentation</strong><br>
<em>[<a href="papers/more/PriFit.pdf">PAPER</a>]
[<a href="https://hippogriff.github.io/prifit/">PAGE WITH CODE & DATA</a>]
</em><br>
<i><span class="student-color">Gopal Sharma</span>, Bidya Dash, Aruni RoyChowdhury, Matheus Gadelha, <span class="student-color">Marios Loizou</span>, Liangliang Cao, Rui Wang, Erik Learned-Miller, Subhransu Maji, Evangelos Kalogerakis</i>
<br>
<i>Computer Graphics Forum, Vol. 41, No. 5, 2022 <br>
(also in the Proceedings of SGP 2022)</i><i><span>
</span></i> <br>
<span class="styleAbstract"><br>
<em>Abstract: </em>
We present PRIFIT, a semi-supervised approach for label-efficient learning of 3D point cloud segmentation networks. PRIFIT
combines geometric primitive fitting with point-based representation learning. Its key idea is to learn point representations
whose clustering reveals shape regions that can be approximated well by basic geometric primitives, such as cuboids and ellipsoids.
The learned point representations can then be re-used in existing network architectures for 3D point cloud segmentation,
and improves their performance in the few-shot setting. According to our experiments on the widely used ShapeNet and PartNet
benchmarks, PRIFIT outperforms several state-of-the-art methods in this setting, suggesting that decomposability into primitives
is a useful prior for learning representations predictive of semantic parts. We present a number of ablative experiments
varying the choice of geometric primitives and downstream tasks to demonstrate the effectiveness of the method.</span> </p>
</div>
</td>
</tr>
<tr valign="middle">
<td class="ptd" width="270" valign="middle"><a href="https://buildingnet.org/"><img src="papers/iccv21bn.jpg" alt="Image" width="300"></a> </td>
<td class="ptd" height="225" valign="middle" align="right">
<div align="justify">
<p> <strong> BuildingNet: Learning to
Label 3D Buildings</strong><br>
<em>[<a href="papers/BuildingNet/BuildingNet.pdf">PAPER</a>][<a href="https://youtu.be/rl30WJo_EBo">VIDEO</a>][<a href="https://buildingnet.org/">PAGE WITH CODE & DATA</a>]</em><br>
<i>Pratheba Selvaraju, <span class="student-color">Mohamed Nabail</span>, <span class="student-color">Marios Loizou</span>, Maria Maslioukova, Melinos Averkiou, Andreas
Andreou, Siddhartha Chaudhuri, Evangelos Kalogerakis</i>
<br>
<i>Proceedings of the International Conference on
Computer Vision (ICCV) 2021 </i> <i><span class="importantnote">(Selected for Oral
Presentation)</span></i><br>
<span class="styleAbstract"><br>
<em>Abstract: </em>We introduce BuildingNet:
(a) a large-scale dataset of 3D building models whose
exteriors are consistently labeled, (b) a graph neural
network that labels building meshes by analyzing spatial
and structural relations of their geometric primitives. To
create our dataset, we used crowdsourcing combined with
expert guidance, resulting in 513K annotated mesh
primitives, grouped into 292K semantic part components
across 2K building models. The dataset covers several
building categories, such as houses, churches,
skyscrapers, town halls, libraries, and castles. We
include a benchmark for evaluating mesh and point cloud
labeling. Buildings have more challenging structural
complexity compared to objects in existing benchmarks
(e.g., ShapeNet, PartNet), thus, we hope that our dataset
can nurture the development of algorithms that are able to
cope with such large-scale geometric data for both vision
and graphics tasks e.g., 3D semantic segmentation,
part-based generative models, correspondences, texturing,
and analysis of point cloud data acquired from real-world
buildings. Finally, we show that our mesh-based graph
neural network significantly improves performance over
several baselines for labeling 3D meshes.</span> </p>
</div>
</td>
</tr>
<tr valign="middle">
<td class="ptd" width="270" valign="middle"><a href="https://people.cs.umass.edu/~dliu/projects/NeuralStrokes/"><img src="papers/iccv21ns.jpg" alt="Image" width="300"></a> </td>
<td class="ptd" height="225" valign="middle" align="right">
<div align="justify">
<p> <strong> Neural Strokes: Stylized
Line Drawing of 3D Shapes</strong><br>
<em>[<a href="papers/neural_strokes/NeuralStrokes.pdf">PAPER</a>] [<a href="https://people.cs.umass.edu/~dliu/projects/NeuralStrokes/">PAGE WITH CODE & DATA</a>]</em><br>
<i><span class="student-color">Difan Liu</span>, Matthew Fisher, Aaron Hertzmann, Evangelos Kalogerakis</i> <br>
<i>Proceedings of the International Conference on
Computer Vision (ICCV) 2021 </i><br>
<span class="styleAbstract"><br>
<em>Abstract: </em>This paper introduces a
model for producing stylized line drawings from 3D shapes.
The model takes a 3D shape and a viewpoint as input, and
outputs a drawing with textured strokes, with variations
in stroke thickness, deformation, and color learned from
an artist's style. The model is fully differentiable. We
train its parameters from a single training drawing of
another 3D shape. We show that, in contrast to previous
image-based methods, the use of a geometric representation
of 3D shape and 2D strokes allows the model to transfer
important aspects of shape and texture style while
preserving contours. Our method outputs the resulting
drawing in a vector representation, enabling richer
downstream analysis or editing in interactive
applications.</span> </p>
</div>
</td>
</tr>
<tr valign="middle">
<td class="ptd" width="270" valign="middle"><a href="papers/more/Urban_City_Texturing.pdf"><img src="papers/3dv_2021.png" alt="Image" width="300"></a> </td>
<td class="ptd" height="225" valign="middle" align="right">
<div align="justify">
<p> <strong> Projective Urban Texturing</strong><br>
<em>[<a href="papers/more/Urban_City_Texturing.pdf">PAPER</a>]</em><br>
<i><span class="student-color">Yiangos Georgiou</span>, Melinos Averkiou, Tom Kelly, Evangelos Kalogerakis</i> <br>
<i>Proceedings of the International Conference on 3D Vision (3DV) 2021</i><br>
<span class="styleAbstract"><br>
<em>Abstract: </em>
This paper proposes a method for automatic generation of textures for 3D city meshes in immersive urban environments. Many recent pipelines capture or synthesize large quantities of city geometry using scanners or procedural modeling pipelines. Such geometry is intricate and realistic, however the generation of photo-realistic textures for such large scenes remains a problem. We propose to generate textures for input target 3D meshes driven by the textural style present in readily available datasets of panoramic photos capturing urban environments. Re-targeting such 2D datasets to 3D geometry is challenging because
the underlying shape, size, and layout of the urban structures in the photos do not correspond to the ones in the target meshes. Photos also often have objects (e.g., trees, vehicles) that may not even be present in the target geometry. To address these issues we present a method, called Projective Urban Texturing (PUT), which re-targets textural style from real-world panoramic images to unseen urban meshes. PUT relies on contrastive and adversarial training of a neural architecture designed for unpaired image-to-texture translation. The generated textures are stored in a texture atlas applied to the target 3D mesh geometry.
We demonstrate both quantitative and qualitative evaluation of the generated textures.
</span> </p>
</div>
</td>
</tr>
<tr valign="middle">
<td class="ptd" width="270" valign="middle"><a href="https://people.umass.edu/%7Eyangzhou/MakeItTalk/"><img
src="papers/sigasia2020.png" alt="Image" width="300"></a> </td>
<td class="ptd" height="225" valign="middle" align="right">
<div align="justify">
<p> <strong> MakeItTalk: Speaker-Aware Talking Head Animation</strong><br>
[<a href="papers/makeittalk/MakeItTalk.pdf"><i>PAPER</i></a>][<a
href="https://youtu.be/vUMGKASgbf8">VIDEO</a>][<i><a href="https://people.umass.edu/%7Eyangzhou/MakeItTalk/">PAGE
WITH CODE & DATA</a></i>] <br>
<i><span class="student-color">Yang Zhou</span>, Xintong Han, Eli Shechtman, Jose
Echevarria, Evangelos Kalogerakis, Dingzeyu Li</i>
<br>
<i>ACM Transactions on Graphics, Vol. 39, No. 6,
2020 (to appear)<br>
(also in the Proceedings of ACM SIGGRAPH ASIA
2020)</i><br>
<span class="styleAbstract"><br>
<em>Abstract: </em>We present a method that
generates expressive talking-head videos from a single
facial image with audio as the only input. In contrast to
previous attempts to learn direct mappings from audio to
raw pixels for creating talking faces, our method first
disentangles the content and speaker information in the
input audio signal. The audio content robustly controls
the motion of lips and nearby facial regions, while the
speaker information determines the specifics of facial
expressions and the rest of the talking-head dynamics.
Another key component of our method is the prediction of
facial landmarks reflecting the speaker-aware dynamics.
Based on this intermediate representation, our method
works with many portrait images in a single unified
framework, including artistic paintings, sketches, 2D
cartoon characters, Japanese mangas, and stylized
caricatures. In addition, our method generalizes well for
faces and characters that were not observed during
training. We present extensive quantitative and
qualitative evaluation of our method, in addition to user
studies, demonstrating generated talking-heads of
significantly higher quality compared to prior
state-of-the-art methods.</span> </p>
</div>
</td>
</tr>
<tr valign="middle">
<td class="ptd" width="270" valign="middle"><a href="https://hippogriff.github.io/parsenet/"><img
src="papers/eccv2020b.jpg" alt="Image" width="300"></a> </td>
<td class="ptd" height="225" valign="middle" align="right">
<div align="justify">
<p> <b> ParSeNet: A Parametric Surface Fitting Network
for 3D Point Clouds</b><br>
<i>[</i><i><a href="papers/parsenet/parsenet.pdf">PAPER</a></i>]
<i>[</i><i><a href="https://hippogriff.github.io/parsenet/">PAGE
WITH CODE & DATA</a></i>] <br>
<i><span class="student-color">Gopal Sharma</span>, <span class="student-color">Difan Liu</span>, Subhransu Maji, Evangelos Kalogerakis, Siddhartha Chaudhuri, Radomir
Mech</i> <br>
<i>Proceedings of the European Conference on
Computer Vision (ECCV) 2020</i> <br>
<span class="styleAbstract"><br>
<em>Abstract: </em>We propose a novel,
end-to-end trainable, deep network called ParSeNet that
decomposes a 3D point cloud into parametric surface
patches, including B-spline patches as well as basic
geometric primitives. ParSeNet is trained on a large-scale
dataset of man-made 3D shapes and captures high-level
semantic priors for shape decomposition. It handles a much
richer class of primitives than prior work, and allows us
to represent surfaces with higher fidelity. It also
produces repeatable and robust parametrizations of a
surface compared to purely geometric approaches. We
present extensive experiments to validate our approach
against analytical and learning-based alternatives.</span>
</p>
</div>
</td>
</tr>
<tr valign="middle">
<td class="ptd" width="270" valign="middle"><a href="https://mgadelha.me/selfsupacd/"><img
src="papers/eccv2020a.png" alt="Image" width="300"></a> </td>
<td class="ptd" height="225" valign="middle" align="right">
<div align="justify">
<p><strong>Label-Efficient Learning on Point Clouds using
Approximate Convex Decompositions</strong><br>
<i>[</i><i><a href="papers/more/acd-eccv20.pdf">PAPER</a>]
</i><i>[<a href="https://mgadelha.me/selfsupacd/">PAGE WITH
CODE & DATA</a>]</i><br>
<i>Matheus Gadelha, Aruni RoyChowdhury, <span class="student-color">Gopal Sharma</span>, Evangelos Kalogerakis, Liangliang Cao, Erik
Learned-Miller, Rui Wang, Subhransu Maji<br>
Proceedings of the European Conference on Computer
Vision (ECCV) 2020</i> <span class="styleAbstract"><br>
<br>
<em>Abstract: </em>The problems of shape
classification and part segmentation from 3D point clouds
have garnered increasing attention in the last few years.
Both of these problems, however, suffer from relatively
small training sets, creating the need for statistically
efficient methods to learn 3D shape representations. In
this paper, we investigate the use of Approximate Convex
Decompositions (ACD) as a self-supervisory signal for
label-efficient learning of point cloud representations.
We show that using ACD to approximate ground truth
segmentation provides excellent self-supervision for
learning 3D point cloud representations that are highly
effective on downstream tasks. We report improvements over
the state-of-the-art for unsupervised representation
learning on the ModelNet40 shape classification dataset
and significant gains in few-shot part segmentation on the
ShapeNetPart dataset. </span><span class="styleAbstract"><br>
</span> </p>
</div>
</td>
</tr>
<tr valign="middle">
<td class="ptd" width="270" valign="middle"><a href="https://zhan-xu.github.io/rig-net/"><img
src="papers/siggraph2020.png" alt="Image" width="300"></a> </td>
<td class="ptd" height="225" valign="middle" align="right">
<div align="justify">
<p><span><strong>RigNet: Neural Rigging for Articulated
Characters</strong><br>
</span><i>[</i><span><i><a href="papers/rignet/rignet.pdf">PAPER</a>]</i></span><i>[<a
href="https://youtu.be/J90VETgWIDg">VIDEO</a>]</i><span><i>
[<a href="https://zhan-xu.github.io/rig-net/">PAGE
WITH CODE & DATA</a>]</i></span><span>
</span><span><br>
</span> <span><em><span class="student-color">Zhan Xu</span>, <span class="student-color">Yang Zhou</span>, Evangelos Kalogerakis, Chris Landreth, Karan Singh</em><br>
</span><i>ACM Transactions on Graphics,
Vol. 39, No. 4, 2020<span><br>
</span>(also in the Proceedings of ACM
SIGGRAPH 2020<span>)</span><span><br>
</span></i><span class="styleAbstract"><br>
<em>Abstract: </em> We present RigNet, an
end-to-end automated method for producing animation rigs
from input character models. Given an input 3D model
representing an articulated character, RigNet predicts a
skeleton that matches the animator expectations in joint
placement and topology. It also estimates surface skin
weights based on the predicted skeleton. Our method is
based on a deep architecture that directly operates on the
mesh representation without making assumptions on shape
class and structure. The architecture is trained on a
large and diverse collection of rigged models, including
their mesh, skeletons and corresponding skin weights. Our
evaluation is three-fold: we show better results than
prior art when quantitatively compared to animator rigs;
qualitatively we show that our rigs can be expressively
posed and animated at multiple levels of detail; and
finally, we evaluate the impact of various algorithm
choices on our output rigs. </span> </p>
<p><span class="styleAbstract"><br>
</span> </p>
</div>
</td>
</tr>
<tr valign="middle">
<td class="ptd" width="270" valign="middle"><a href="https://github.com/marios2019/learning_part_boundaries"><img
src="papers/sgp2020.png" alt="Image" width="300"></a> </td>
<td class="ptd" height="225" valign="middle" align="right">
<div align="justify">
<p><span><b>Learning Part Boundaries from 3D Point Clouds</b><br>
</span><i>[</i><span><i><a href="papers/more/learningpartboundaries.pdf">PAPER</a>]</i></span><span><i>
[<a href="https://github.com/marios2019/learning_part_boundaries">PAGE
WITH CODE & DATA</a>]</i></span><span>
</span><span><br>
</span> <span><span class="student-color">Marios Loizou</span>, Melinos Averkiou, Evangelos Kalogerakis<br>
</span><i>Computer Graphics Forum<span>,
</span>Vol. 39, No. 5, 2020<span><br>
</span>(also in the Proceedings of SGP 2020<span>)</span><span><br>
</span></i><span class="styleAbstract"><br>
<em>Abstract: </em>We present a method that
detects boundaries of parts in 3D shapes represented as
point clouds. Our method is based on a graph convolutional
network architecture that outputs a probability for a
point to lie in an area that separates two or more parts
in a 3D shape. Our boundary detector is quite generic: it
can be trained to localize boundaries of semantic parts or
geometric primitives commonly used in 3D\ modeling. Our
experiments demonstrate that our method can extract more
accurate boundaries that are closer to ground-truth ones
compared to alternatives. We also demonstrate an
application of our network to fine-grained semantic shape
segmentation, where we also show improvements in terms of
part labeling performance. </span> </p>
<p><span class="styleAbstract"><br>
</span> </p>
</div>
</td>
</tr>
<tr valign="middle">
<td class="ptd" width="270" valign="middle"><a href="https://people.cs.umass.edu/%7Edliu/projects/NeuralContours/"><img
src="papers/cvpr20.jpg" alt="Image" width="300"></a> </td>
<td class="ptd" height="225" valign="middle" align="right">
<div align="justify">
<p><span><strong>Neural Contours: Learning to Draw Lines from
3D Shapes</strong><br>
</span><i>[</i><span><i><a href="papers/neural_contours/NeuralContours.pdf">PAPER</a>]
[<a href="https://people.cs.umass.edu/%7Edliu/projects/NeuralContours/">PAGE
WITH CODE & DATA</a>]</i></span><span>
</span><span><br>
</span> <span><em><span class="student-color">Difan Liu</span>, <span class="student-color">Mohamed Nabail</span>, Evangelos Kalogerakis, Aaron Hertzmann</em><br>
</span><i>Proceedings of the Computer Vision and Pattern
Recognition (CVPR) 2020</i><i><span><br>
</span></i><span class="styleAbstract"><br>
<em>Abstract: </em> This paper introduces a
method for learning to generate line drawings from 3D
models. Our architecture incorporates a differentiable
module operating on geometric features of the 3D model,
and an image-based module operating on view-based shape
representations. At test time, geometric and view-based
reasoning are combined by a neural ranking module to
create a line drawing. The model is trained on a large
number of crowdsourced comparisons of line drawings.
Experiments demonstrate that our method achieves
significant improvements in line drawing over the
state-of-the-art when evaluated on standard benchmarks,
resulting in drawings that are comparable to those
produced by experienced human artists. </span> </p>
<p><span class="styleAbstract"><br>
</span> </p>
</div>
</td>
</tr>
<tr valign="middle">
<td class="ptd" width="270" valign="middle"> <a href="https://people.umass.edu/%7Eyangzhou/scenegraphnet/">
<img src="papers/iccv19.jpg" alt="Image" width="300"> </a> </td>
<td class="ptd" height="225" valign="middle" align="right">
<div align="justify">
<p><strong>SceneGraphNet: Neural Message Passing for 3D
Indoor Scene Augmentation<br>
</strong> <em> [</em><span><i><a href="papers/scenegraphnet/ICCV_SceneGraphNet.pdf">PAPER</a>]
[<a href="https://people.umass.edu/%7Eyangzhou/scenegraphnet/">PAGE
WITH CODE & DATA</a>]</i></span><span> </span><span><br>
</span><span><em><span class="student-color">Yang Zhou</span>, Zachary While, Evangelos Kalogerakis</em><br>
</span><i>Proceedings of the International Conference on
Computer Vision (ICCV) 2019</i><i><span><br>
</span></i><span class="styleAbstract"><br>
<em>Abstract: </em>In this paper we propose a
neural message passing approach to augment an input 3D
indoor scene with new objects matching their surroundings.
Given an input, potentially incomplete, 3D scene and a
query location, our method predicts a probability
distribution over object types that fit well in that
location. Our distribution is predicted though passing
learned messages in a dense graph whose nodes represent
objects in the input scene and edges represent spatial and
structural relationships. By weighting messages through an
attention mechanism, our method learns to focus on the
most relevant surrounding scene context to predict new
scene objects. We found that our method significantly
outperforms state-of-the-art approaches in terms of
correctly predicting objects missing in a scene based on
our experiments in the SUNCG dataset. We also demonstrate
other applications of our method, including context-based
3D object recognition and iterative scene generation. </span>
</p>
<p><span class="styleAbstract"><br>
</span> </p>
</div>
</td>
</tr>
<tr valign="middle">
<td class="ptd" width="270" valign="middle"> <a href="https://people.cs.umass.edu/%7Ezhanxu/projects/AnimSkelVolNet/">
<img src="papers/3dv19r.jpg" alt="Image" width="300"> </a> </td>
<td class="ptd" height="225" valign="middle" align="right">
<div align="justify">
<p><span><strong>Predicting Animation Skeletons for 3D
Articulated Models via Volumetric Nets</strong><br>
</span><i>[</i><span><i><a href="papers/AnimationSkeletons/AnimSkelVolNet.pdf">PAPER</a>]
[<a href="https://people.cs.umass.edu/%7Ezhanxu/projects/AnimSkelVolNet/">PAGE
WITH CODE & DATA</a>]</i></span> <i><br>
</span><span><em><span class="student-color">Zhan Xu</span>, <span class="student-color">Yang Zhou</span>, Evangelos Kalogerakis, Karan Singh</em><br>
</span><i>Proceedings of the
International Conference on 3D Vision (3DV) 2019<span class="importantnote"> (Selected for Oral
Presentation)</span></i><span><span><br>
</span></i><span class="styleAbstract"><br>
<em>Abstract: </em> We present a learning
method for predicting animation skeletons for input 3D
models of articulated characters. In contrast to previous
approaches that fit pre-defined skeleton templates or
predict fixed sets of joints, our method produces an
animation skeleton tailored for the structure and geometry
of the input 3D model. Our architecture is based on a
stack of hourglass modules trained on a large dataset of
3D rigged characters mined from the web. It operates on
the volumetric representation of the input 3D shapes
augmented with geometric shape features that provide
additional cues for joint and bone locations. Our method
also enables intuitive user control of the level-of-detail
for the output skeleton. Our evaluation demonstrates that
our approach predicts animation skeletons that are much
more similar to the ones created by humans compared to
several alternatives and baselines. <br>
</span> </p>
</div>
</td>
</tr>
<tr valign="middle">
<td class="ptd" width="270" valign="middle"> <a href="papers/more/PEN.pdf">
<img src="papers/3dv19s.jpg" alt="Image" width="300"> </a> </td>
<td class="ptd" height="225" valign="middle" align="right">
<div align="justify">
<p><span><strong>Learning Point Embeddings from Shape
Repositories for Few-Shot Segmentation</strong><br>
</span><i>[</i><span><i><a href="papers/more/PEN.pdf">PAPER</a>]</i></span>
<i><span class="importantnote"> </span></i><span><br>
</span><span><em><span class="student-color">Gopal Sharma</span>, Evangelos Kalogerakis,
Subhransu Maji</em><br>
</span><i>Proceedings of the International Conference on
3D Vision (3DV) 2019<span class="importantnote"> (Selected for Oral
Presentation)</span></i><span></i><i><span><br>
</span></i><span class="styleAbstract"><br>
<em>Abstract: </em> User generated 3D shapes
in online repositories contain rich information about
surfaces, primitives, and their geometric relations, often
arranged in a hierarchy. We present a framework for
learning representations of 3D shapes that reflect the
information present in this meta data and show that it
leads to improved generalization for semantic segmentation
tasks. Our approach is a point embedding network that
generates a vectorial representation of the 3D point such
that it reflects the grouping hierarchy and tag data. The
main challenge is that the data is highly variable and
noisy. To this end, we present tree-aware metrics to
supervise a metric-learning approach and demonstrate that
such learned embeddings offer excellent transfer to
semantic segmentation tasks, especially when training data
is limited. <br>
<br>
</span> </p>
</div>
</td>
</tr>
<tr valign="middle">
<td class="ptd" width="270" valign="middle"> <a href="https://github.com/ericyi/articulated-part-induction/">
<img src="papers/sigasia18.png" alt="Image" width="300"> </a>
</td>
<td class="ptd" height="225" valign="middle" align="right">
<div align="justify">
<p><span><strong>Deep Part Induction from Articulated Object
Pairs</strong><br>
</span><i> [</i><span><i><a href="https://kalo-ai.github.io/papers/partinduction/partinduction.pdf">PAPER</a>]
[<a href="https://github.com/ericyi/articulated-part-induction/">PAGE
WITH CODE & DATA</a>]</i></span><span> </span><span><br>
</span><span><em>Li Yi, <span class="student-color">Haibin Huang</span>, <span class="student-color">Difan Liu</span>, Evangelos Kalogerakis, Hao Su, Leonidas Guibas</em><br>
</span><i>ACM Transactions on Graphics, Vol. 37, No. 6,
2018<br>
(also in the Proceedings of ACM SIGGRAPH ASIA
2018<span>)</span></i><i><span><br>
</span></i><span class="styleAbstract"><br>
<em>Abstract: </em> Object functionality is
often expressed through part articulation -- as when the
two rigid parts of a scissor pivot against each other to
perform the cutting function. Such articulations are often
similar across objects within the same functional
category. In this paper, we explore how the observation of
different articulation states provides evidence for part
structure and motion of 3D objects. Our method takes as
input a pair of unsegmented shapes representing two
different articulation states of two functionally related
objects, and induces their common parts along with their
underlying rigid motion. This is a challenging setting, as
we assume no prior shape structure, no prior shape
category information, no consistent shape orientation, the
articulation states may belong to objects of different
geometry, plus we allow inputs to be noisy and partial
scans, or point clouds lifted from RGB images. Our method
learns a neural network architecture with three modules
that respectively propose correspondences, estimate 3D
deformation flows, and perform segmentation. To achieve
optimal performance, our architecture alternates between
correspondence, deformation flow, and segmentation
prediction iteratively in an ICP-like fashion. Our results
demonstrate that our method significantly outperforms
state-of-the-art techniques in the task of discovering
articulated parts of objects. In addition, our part
induction is object-class agnostic and successfully
generalizes to new and unseen objects. <br>
</span> </p>
</div>
</td>
</tr>
<tr valign="middle">
<td class="ptd" width="270" valign="middle"><a href="https://people.umass.edu/%7Eyangzhou/visemenet/"><img
src="papers/siggraph2018.jpg" alt="Image" width="300"></a></td>
<td class="ptd" height="225" valign="middle" align="right">
<div align="justify">
<p><span><strong>VisemeNet: Audio-Driven Animator-Centric
Speech Animation</strong><br>
</span><i>[</i><span><i><a href="https://kalo-ai.github.io/papers/visemenet/visemenet.pdf">PAPER</a>][<a
href="https://youtu.be/kk2EnyMD3mo">VIDEO</a>][<a href="https://people.umass.edu/%7Eyangzhou/visemenet/">PAGE
WITH CODE & DATA</a>]</i></span><span><br>
</span><span><em><span class="student-color">Yang Zhou</span>, <span class="student-color">Zhan Xu</span>, Chris Landreth, Evangelos Kalogerakis, Subhransu Maji, Karan Singh</em><br>