推荐阅读：

ICCV2021最新信息及论文下载贴（Papers/Codes/Project/PaperReading／Demos/直播分享／论文分享会等）

官网链接：http://iccv2021.thecvf.com
时间：2021年10月11日-10月17日
论文接收公布时间：2021年7月22日

1.ICCV2021接受论文/代码分方向汇总（更新中）

分类目录：

1. 检测

2D目标检测(2D Object Detection)
视频目标检测(Video Object Detection)
3D目标检测(3D Object Detection)
人物交互检测(HOI Detection)
伪装目标检测(Camouflaged Object Detection)
旋转目标检测(Rotation Object Detection)
显著性目标检测(Saliency Object Detection)
图像异常检测(Anomally Detection in Image)
关键点检测(Keypoint Detection)

2. 分割(Segmentation)

图像分割(Image Segmentation)
全景分割(Panoptic Segmentation)
语义分割(Semantic Segmentation)
实例分割(Instance Segmentation)
超像素(Superpixel)
视频目标分割(Video Object Segmentation)
抠图(Matting)
密集预测(Dense Prediction)

3. 图像处理(Image Processing)

超分辨率(Super Resolution)
图像复原/图像增强(Image Restoration)
图像去阴影/去反射(Image Shadow Removal/Image Reflection Removal)
图像去噪/去模糊/去雨去雾(Image Denoising)
图像编辑/修复(Image Edit/Image Inpainting)
图像翻译(Image Translation)
图像质量评估(Image Quality Assessment)
风格迁移(Style Transfer)

4. 估计(Estimation)

姿态估计(Pose Estimation)
手势估计(Gesture Estimation)
光流/位姿/运动估计(Flow/Pose/Motion Estimation)
深度估计(Depth Estimation)

5. 图像&视频检索/理解(Image&Video Retrieval/Video Understanding)

行为识别/行为识别/动作识别/检测/分割(Action/Activity Recognition)
行人重识别/检测(Re-Identification/Detection)
图像/视频字幕(Image/Video Caption)

6. 人脸(Face)

人脸识别/检测(Facial Recognition/Detection)
人脸生成/合成/重建/编辑(Face Generation/Face Synthesis/Face Reconstruction/Face Editing)
人脸伪造/反欺骗(Face Forgery/Face Anti-Spoofing)

7. 三维视觉(3D Vision)

点云(Point Cloud)
三维重建(3D Reconstruction)

8. 目标跟踪(Object Tracking)

9. 医学影像(Medical Imaging)

10. 文本检测/识别(Text Detection/Recognition)

11. 遥感图像(Remote Sensing Image)

12. GAN/生成式/对抗式(GAN/Generative/Adversarial)

13. 图像生成/合成(Image Generation/Image Synthesis)

视图合成(View Synthesis)

14. 场景图(Scene Graph)

场景图生成(Scene Graph Generation)
场景图预测(Scene Graph Prediction)
场景图理解(Scene Graph Understanding)

15. 视觉定位(Visual Localization)

图像匹配(Image Matching)

16. 视觉推理/视觉问答(Visual Reasoning/VQA)

17. 图像分类(Image Classification)

18. 神经网络设计与优化(Neural Network Design & Optimization)

CNN
Attention
Transformer
图神经网络(GNN)
神经网络架构搜索(NAS)
损失函数(Loss Function)

19. 模型压缩(Model Compression)

知识蒸馏(Knowledge Distillation)
剪枝(Pruning)
量化(Quantization)

20. 模型训练/泛化(Model Training/Generalization)

噪声标签(Noisy Label)
长尾分布(Long-Tailed Distribution)

21. 模型评估(Model Evaluation)

22. 数据处理(Data Processing)

数据增广(Data Augmentation)
表征学习(Representation Learning)
归一化/正则化(Batch Normalization)
图像聚类(Image Clustering)
图像压缩(Image Compression)
异常检测(Anomaly Detection)

23. 主动学习(Active Learning)

24. 小样本学习/零样本学习(Few-shot/Zero-shot Learning)

25. 持续学习(Continual Learning/Life-long Learning)

26. 迁移学习/domain/自适应(Transfer Learning/Domain Adaptation)

27. 度量学习(Metric Learning)

28. 对比学习(Contrastive Learning)

29. 增量学习(Incremental Learning)

30. 强化学习(Reinforcement Learning)

31. 元学习(Meta Learning)

32. 多模态学习(Multi-Modal Learning)

视听学习(Audio-visual Learning)

33. 视觉预测(Vision-based Prediction)

34. 数据集(Dataset)

暂无分类

检测

2D目标检测(2D Object Detection)

[6] SimROD: A Simple Adaptation Method for Robust Object Detection
paper

[5] Active Learning for Deep Object Detection via Probabilistic Modeling
paper

[4] Detecting Invisible People
paper | project | video

[3] Conditional Variational Capsule Network for Open Set Recognition
paper | code

[2] MDETR : Modulated Detection for End-to-End Multi-Modal Understanding(Oral)
paper | code | project | colab

[1] DetCo: Unsupervised Contrastive Learning for Object Detection
paper | code

3D目标检测(3D Object Detection)

[1] Unsupervised Domain Adaptive 3D Detection with Multi-Level Consistency
paper

图像异常检测(Anomally Detection in Image)

[1] Divide-and-Assemble: Learning Block-wise Memory for Unsupervised Anomaly Detection
paper

分割(Segmentation)

图像分割(Image Segmentation)

[2] Labels4Free: Unsupervised Segmentation using StyleGAN
paper | code | project

[1] Mining Latent Classes for Few-shot Segmentation(Oral)
paper | code

实例分割(Instance Segmentation)

[2] Crossover Learning for Fast Online Video Instance Segmentation
code

[1] Instances as Queries
paper | code

语义分割(Semantic Segmentation)

[6] Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised Semantic Segmentation
paper

[5] ReDAL: Region-based and Diversity-aware Active Learning for Point Cloud Semantic Segmentation(点云语义分割)
paper

[4] Domain Adaptive Video Segmentation via Temporal Consistency Regularization(video semantic segmentation)
paper | code

[3] Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation(Oral)
paper

[2] Re-distributing Biased Pseudo Labels for Semi-supervised Semantic Segmentation: A Baseline Investigation(Oral)
paper | code

[1] Calibrated Adversarial Refinement for Stochastic Semantic Segmentation
paper | code

人脸(Face)

人脸识别/检测(Facial Recognition/Detection)

人脸生成/合成/重建/编辑(Face Generation/Face Synthesis/Face Reconstruction/Face Editing)

[3] MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement(音频驱动面部动画)
paper | video

[2] Focal Frequency Loss for Image Reconstruction and Synthesis
paper | code

[1] HeadGAN: One-shot Neural Head Synthesis and Editing
paper

三维视觉(3D Vision)

[1] Score-Based Point Cloud Denoising
paper

点云(Point Cloud)

[1] HRegNet: A Hierarchical Network for Large-scale Outdoor LiDAR Point Cloud Registration
paper | project

三维重建(3D Reconstruction)

[1] PlaneTR: Structure-Guided Transformers for 3D Plane Recovery
paper | code

神经网络设计与优化(Neural Network Structure Design & Optimization)

[1] Energy-Based Open-World Uncertainty Modeling for Confidence Calibration(置信度校准)
paper

[2] Learning to Resize Images for Computer Vision Tasks
paper

[1] Bias Loss for Mobile Neural Networks
paper

Attention

[2] SCOUTER: Slot Attention-based Classifier for Explainable Image Recognition
paper | code

[1] FcaNet: Frequency Channel Attention Networks
paper | code

Transformer

[4] AutoFormer: Searching Transformers for Visual Recognition
paper | code

[3] Rethinking Spatial Dimensions of Vision Transformers
paper | code

[2] Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers(Oral)
paper | code

[1] Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions(Oral)
paper | code
解读：金字塔视觉Transformer(PVT)：用于密集预测的多功能backbone

神经网络架构搜索(NAS)

[1] AutoFormer: Searching Transformers for Visual Recognition
paper | code

损失函数(Loss Function)

[3] Rank & Sort Loss for Object Detection and Instance Segmentation(Oral)
paper | code

[2] Focal Frequency Loss for Image Reconstruction and Synthesis
paper | code

[1] Orthogonal Projection Loss
paper | code

图像生成/合成(Image Generation/Image Synthesis)

[3] A Light Stage on Every Desk
paper | project

[2] Handwriting Transformers
paper

[1] On Generating Transferable Targeted Perturbations
paper | code

GAN/生成式/对抗式(GAN/Generative/Adversarial)

[6] Learnable Boundary Guided Adversarial Training
paper | code

[5] Transporting Causal Mechanisms for Unsupervised Domain Adaptation(Oral)
paper

[4] Robustness via Cross-Domain Ensembles(Oral)
paper | code | model | homepage | video

[3] HeadGAN: One-shot Neural Head Synthesis and Editing
paper

[2] Labels4Free: Unsupervised Segmentation using StyleGAN
paper | code | project

[1] EigenGAN: Layer-Wise Eigen-Learning for GANs
paper | code

图像处理(Image Processing)

[2] Accelerating Atmospheric Turbulence Simulation via Learned Phase-to-Space Transform
paper

[1] Equivariant Imaging: Learning Beyond the Range Space(Oral)
paper

超分辨率(Super Resolution)

[1] Learning for Scale-Arbitrary Super-Resolution from Scale-Specific Networks
paper | code

风格迁移(Style Transfer)

[2] ALADIN: All Layer Adaptive Instance Normalization for Fine-grained Style Similarity(风格迁移)
paper | [cod]

[1] Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts(字体生成)
paper | code

估计(Estimation)

姿态估计(Human Pose Estimation)

[3] Human Pose Regression with Residual Log-likelihood Estimation(Oral)
paper | code

[2] PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop(Oral)
paper | code | project

[1] HuMoR: 3D Human Motion Model for Robust Pose Estimation(Oral)
paper | video | project

深度估计(Depth Estimation)

[1] MonoIndoor: Towards Good Practice of Self-Supervised Monocular Depth Estimation for Indoor Environments
paper

图像&视频检索/理解(Image&Video Retrieval/Video Understanding)

[2] Hand Image Understanding via Deep Multi-Task Learning(手部图像理解)
paper

[1] Cross-Sentence Temporal and Semantic Relations in Video Activity Localisation
paper

行为识别/行为识别/动作识别/检测/分割(Action/Activity Recognition)

[2] Enriching Local and Global Contexts for Temporal Action Localization
paper

[1] Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition
paper | code

行人重识别/检测(Re-Identification/Detection)

[2] Spatio-Temporal Representation Factorization for Video-based Person Re-Identification
paper

[1] TransReID: Transformer-based Object Re-Identification
paper | code
解读：来自Transformer的降维打击：ReID各项任务全面领先，阿里&浙大提出TransReID

视觉定位(Visual Localization)

[3] Normalization Matters in Weakly Supervised Object Localization
paper

[2] TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization
paper | code

[1] Boundary-sensitive Pre-training for Temporal Localization in Videos
paper

图像匹配(Image Matching)

[2] Warp Consistency for Unsupervised Learning of Dense Correspondences(Oral)
paper | code

[1] COTR: Correspondence Transformer for Matching Across Images
paper

三维视觉(3D Vision)

[1] MVTN: Multi-View Transformation Network for 3D Shape Recognition
paper

目标跟踪(Object Tracking)

[2] Learning to Adversarially Blur Visual Object Tracking
paper | code

[1] Detecting Invisible People
paper | project | video

医学影像(Medical Imaging)

[1] Generative Adversarial Registration for Improved Conditional Deformable Templates
paper | code | homepage

文本检测/识别(Text Detection/Recognition)

[4] Adaptive Boundary Proposal Network for Arbitrary Shape Text Detection
paper

[3] Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition
paper

[2] Text is Text, No Matter What: Unifying Text Recognition using Knowledge Distillation
paper

[1] Towards the Unseen: Iterative Text Recognition by Distilling from Errors
paper

遥感图像(Remote Sensing Image)

[3] Change is Everywhere Single-Temporal Supervised Object Change Detection for High Spatial Resolution Remote Sensing Imagery(变化检测)
code

[2] Geography-Aware Self-Supervised Learning
paper

[1] Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote Sensing Data(迁移学习)
paper ｜ code

场景图(Scene Graph)

场景图生成(Scene Graph Generation)

[2] Spatial-Temporal Transformer for Dynamic Scene Graph Generation
paper

[1] Unconstrained Scene Generation with Locally Conditioned Radiance Fields
paper

场景图预测(Scene Graph Prediction)

[1] Generative Compositional Augmentations for Scene Graph Prediction
paper | code

数据处理(Data Processing)

数据增广(Data Augmentation)

[1] MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks
paper

异常检测(Anomaly Detection)

[1] Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning
paper | code

表征学习(Representation Learning)

[1] In-Place Scene Labelling and Understanding with Implicit Scene Representation(Oral)
paper | project

归一化/正则化(Batch Normalization)

图像聚类(Image Clustering)

[3] Graph Constrained Data Representation Learning for Human Motion Segmentation(人体运动分割)
paper

[2] Improve Unsupervised Pretraining for Few-label Transfer
paper

[1] Clustering by Maximizing Mutual Information Across Views
paper

迁移学习/domain/自适应(Transfer Learning/Domain Adaptation)

[6] Adversarial Unsupervised Domain Adaptation with Conditional and Label Shift: Infer, Align and Iterate
paper

[5] Recursively Conditional Gaussian for Ordinal Unsupervised Domain Adaptation(Oral)
paper

[4] Improve Unsupervised Pretraining for Few-label Transfer
paper

[3] Generalized Source-free Domain Adaptation
homepage | code

[2] Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote Sensing Data(迁移学习)
paper ｜ code

[1] Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling(迁移学习)
paper

度量学习(Metric Learning)

[1] Learning with Memory-based Virtual Classes for Deep Metric Learning
paper

增量学习(Incremental Learning)

[1] Always Be Dreaming: A New Approach for Data-Free Class-Incremental Learning
paper | code | project

对比学习(Contrastive Learning)

[3] Parametric Contrastive Learning
paper | code

[2] Geography-Aware Self-Supervised Learning
paper

[1] CoMatch: Semi-supervised Learning with Contrastive Graph Regularization
paper | code

主动学习(Active Learning)

[1] Active Learning for Deep Object Detection via Probabilistic Modeling
paper

视觉推理/视觉问答(Visual Reasoning/VQA)

[3] Greedy Gradient Ensemble for Robust Visual Question Answering
paper | code

[2] On the hidden treasure of dialog in video question answering
paper

[1] Just Ask: Learning to Answer Questions from Millions of Narrated Videos(Oral)
paper | code | project

视觉预测(Vision-based Prediction)

[1] On Exposing the Challenging Long Tail in Future Prediction of Traffic Actors
paper | code

数据集(Dataset)

[1] 4DComplete: Non-Rigid Motion Estimation Beyond the Observable Surface(4D重建)
paper | dataset | video

暂无分类

Spatial Uncertainty-Aware Semi-Supervised Crowd Counting(人群计数)
paper

Rethinking Counting and Localization in Crowds: A Purely Point-Based Framework(Oral)(人群计数)
paper | code

Uniformity in Heterogeneity:Diving Deep into Count Interval Partition for Crowd Counting(人群计数)
paper | code

Self-Conditioned Probabilistic Learning of Video Rescaling(视频压缩)
paper

Mixed SIGNals: Sign Language Production via a Mixture of Motion Primitives(手势生成)
paper

Temporal-wise Attention Spiking Neural Networks for Event Streams Classification
paper

Long-Term Temporally Consistent Unpaired Video Translation from Simulated Surgical 3D Data（视频翻译/医学/视频合成）
paper

Pathdreamer: A World Model for Indoor Navigation(视觉导航)
paper

IPOKE: POKING A STILL IMAGE FOR CONTROLLED STOCHASTIC VIDEO SYNTHESIS
paper | code | project

Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis
paper | project

KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs
paper | code

2. ICCV2021 Oral（更新中）

[18] Recursively Conditional Gaussian for Ordinal Unsupervised Domain Adaptation(Oral)
paper

[17] Rethinking Counting and Localization in Crowds: A Purely Point-Based Framework(Oral)(人群计数)
paper | code

[16] Rank & Sort Loss for Object Detection and Instance Segmentation(Oral)
paper | code

[15] Transporting Causal Mechanisms for Unsupervised Domain Adaptation
paper

[14] Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation(Oral)
[paper](https://arxiv.org/abs/2107.11264

[13] Re-distributing Biased Pseudo Labels for Semi-supervised Semantic Segmentation: A Baseline Investigation(Oral)
paper | code

[12] Human Pose Regression with Residual Log-likelihood Estimation(Oral)
paper | code

[11] Robustness via Cross-Domain Ensembles(Oral)
paper | code | model | homepage

[10] Warp Consistency for Unsupervised Learning of Dense Correspondences(Oral)
paper | code

[9] PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop(Oral)
paper | code | project

[8] HuMoR: 3D Human Motion Model for Robust Pose Estimation(Oral)
paper | video | project

[7] Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers(Oral)
paper | code

[6] Equivariant Imaging: Learning Beyond the Range Space(Oral)
paper

[5] MDETR : Modulated Detection for End-to-End Multi-Modal Understanding(Oral)
paper | code | project | colab

[4] Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions(Oral)
paper | code
解读：金字塔视觉Transformer(PVT)：用于密集预测的多功能backbone

[3] Mining Latent Classes for Few-shot Segmentation(Oral)
paper | code

[2] In-Place Scene Labelling and Understanding with Implicit Scene Representation(Oral)
paper | project

[1] Just Ask: Learning to Answer Questions from Millions of Narrated Videos(Oral)
paper | code

3. ICCV2021论文解读汇总（更新中）

[2] TransReID: Transformer-based Object Re-Identification
paper | code
解读：来自Transformer的降维打击：ReID各项任务全面领先，阿里&浙大提出TransReID

[1] Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions(Oral)
paper | code
解读：金字塔视觉Transformer(PVT)：用于密集预测的多功能backbone

Files

ICCV2021.md

Latest commit

History

ICCV2021.md

File metadata and controls

ICCV2021最新信息及论文下载贴（Papers/Codes/Project/PaperReading／Demos/直播分享／论文分享会等）

目录

1.ICCV2021接受论文/代码分方向汇总（更新中）

分类目录：

检测

2D目标检测(2D Object Detection)

3D目标检测(3D Object Detection)

图像异常检测(Anomally Detection in Image)

分割(Segmentation)

图像分割(Image Segmentation)

实例分割(Instance Segmentation)

语义分割(Semantic Segmentation)

人脸(Face)

人脸识别/检测(Facial Recognition/Detection)

人脸生成/合成/重建/编辑(Face Generation/Face Synthesis/Face Reconstruction/Face Editing)

三维视觉(3D Vision)

点云(Point Cloud)

三维重建(3D Reconstruction)

神经网络设计与优化(Neural Network Structure Design & Optimization)

Attention

Transformer

神经网络架构搜索(NAS)

损失函数(Loss Function)

图像生成/合成(Image Generation/Image Synthesis)

GAN/生成式/对抗式(GAN/Generative/Adversarial)

图像处理(Image Processing)

超分辨率(Super Resolution)

风格迁移(Style Transfer)

估计(Estimation)

姿态估计(Human Pose Estimation)

深度估计(Depth Estimation)

图像&视频检索/理解(Image&Video Retrieval/Video Understanding)

行为识别/行为识别/动作识别/检测/分割(Action/Activity Recognition)

行人重识别/检测(Re-Identification/Detection)

视觉定位(Visual Localization)

图像匹配(Image Matching)

三维视觉(3D Vision)

目标跟踪(Object Tracking)

医学影像(Medical Imaging)

文本检测/识别(Text Detection/Recognition)

遥感图像(Remote Sensing Image)

场景图(Scene Graph)

场景图生成(Scene Graph Generation)

场景图预测(Scene Graph Prediction)

数据处理(Data Processing)

数据增广(Data Augmentation)

异常检测(Anomaly Detection)

表征学习(Representation Learning)

归一化/正则化(Batch Normalization)

图像聚类(Image Clustering)

迁移学习/domain/自适应(Transfer Learning/Domain Adaptation)

度量学习(Metric Learning)

增量学习(Incremental Learning)

对比学习(Contrastive Learning)

主动学习(Active Learning)

视觉推理/视觉问答(Visual Reasoning/VQA)

视觉预测(Vision-based Prediction)

数据集(Dataset)

暂无分类

2. ICCV2021 Oral（更新中）

3. ICCV2021论文解读汇总（更新中）