官网链接:http://iccv2021.thecvf.com
时间:2021年10月11日-10月17日
论文接收公布时间:2021年7月22日
相关问题:
1. ICCV2021接受论文/代码分方向汇总(更新中)
2. ICCV2021 Oral(更新中)
3. ICCV2021论文解读汇总(更新中)
- 2D目标检测(2D Object Detection)
- 视频目标检测(Video Object Detection)
- 3D目标检测(3D Object Detection)
- 人物交互检测(HOI Detection)
- 伪装目标检测(Camouflaged Object Detection)
- 旋转目标检测(Rotation Object Detection)
- 显著性目标检测(Saliency Object Detection)
- 图像异常检测(Anomally Detection in Image)
- 关键点检测(Keypoint Detection)
- 图像分割(Image Segmentation)
- 全景分割(Panoptic Segmentation)
- 语义分割(Semantic Segmentation)
- 实例分割(Instance Segmentation)
- 超像素(Superpixel)
- 视频目标分割(Video Object Segmentation)
- 抠图(Matting)
- 密集预测(Dense Prediction)
- 超分辨率(Super Resolution)
- 图像复原/图像增强(Image Restoration)
- 图像去阴影/去反射(Image Shadow Removal/Image Reflection Removal)
- 图像去噪/去模糊/去雨去雾(Image Denoising)
- 图像编辑/修复(Image Edit/Image Inpainting)
- 图像翻译(Image Translation)
- 图像质量评估(Image Quality Assessment)
- 风格迁移(Style Transfer)
- 姿态估计(Pose Estimation)
- 手势估计(Gesture Estimation)
- 光流/位姿/运动估计(Flow/Pose/Motion Estimation)
- 深度估计(Depth Estimation)
- 行为识别/行为识别/动作识别/检测/分割(Action/Activity Recognition)
- 行人重识别/检测(Re-Identification/Detection)
- 图像/视频字幕(Image/Video Caption)
- 人脸识别/检测(Facial Recognition/Detection)
- 人脸生成/合成/重建/编辑(Face Generation/Face Synthesis/Face Reconstruction/Face Editing)
- 人脸伪造/反欺骗(Face Forgery/Face Anti-Spoofing)
- 数据增广(Data Augmentation)
- 表征学习(Representation Learning)
- 归一化/正则化(Batch Normalization)
- 图像聚类(Image Clustering)
- 图像压缩(Image Compression)
- 异常检测(Anomaly Detection)
[6] SimROD: A Simple Adaptation Method for Robust Object Detection
paper
[5] Active Learning for Deep Object Detection via Probabilistic Modeling
paper
[4] Detecting Invisible People
paper | project | video
[3] Conditional Variational Capsule Network for Open Set Recognition
paper | code
[2] MDETR : Modulated Detection for End-to-End Multi-Modal Understanding(Oral)
paper | code | project | colab
[1] DetCo: Unsupervised Contrastive Learning for Object Detection
paper | code
[1] Unsupervised Domain Adaptive 3D Detection with Multi-Level Consistency
paper
[1] Divide-and-Assemble: Learning Block-wise Memory for Unsupervised Anomaly Detection
paper
[2] Labels4Free: Unsupervised Segmentation using StyleGAN
paper | code | project
[1] Mining Latent Classes for Few-shot Segmentation(Oral)
paper | code
[2] Crossover Learning for Fast Online Video Instance Segmentation
code
[1] Instances as Queries
paper | code
[6] Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised Semantic Segmentation
paper
[5] ReDAL: Region-based and Diversity-aware Active Learning for Point Cloud Semantic Segmentation(点云语义分割)
paper
[4] Domain Adaptive Video Segmentation via Temporal Consistency Regularization(video semantic segmentation)
paper | code
[3] Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation(Oral)
paper
[2] Re-distributing Biased Pseudo Labels for Semi-supervised Semantic Segmentation: A Baseline Investigation(Oral)
paper | code
[1] Calibrated Adversarial Refinement for Stochastic Semantic Segmentation
paper | code
[3] MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement(音频驱动面部动画)
paper | video
[2] Focal Frequency Loss for Image Reconstruction and Synthesis
paper | code
[1] HeadGAN: One-shot Neural Head Synthesis and Editing
paper
[1] Score-Based Point Cloud Denoising
paper
[1] HRegNet: A Hierarchical Network for Large-scale Outdoor LiDAR Point Cloud Registration
paper | project
[1] PlaneTR: Structure-Guided Transformers for 3D Plane Recovery
paper | code
[1] Energy-Based Open-World Uncertainty Modeling for Confidence Calibration(置信度校准)
paper
[2] Learning to Resize Images for Computer Vision Tasks
paper
[1] Bias Loss for Mobile Neural Networks
paper
[2] SCOUTER: Slot Attention-based Classifier for Explainable Image Recognition
paper | code
[1] FcaNet: Frequency Channel Attention Networks
paper | code
[4] AutoFormer: Searching Transformers for Visual Recognition
paper | code
[3] Rethinking Spatial Dimensions of Vision Transformers
paper | code
[2] Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers(Oral)
paper | code
[1] Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions(Oral)
paper | code
解读:金字塔视觉Transformer(PVT):用于密集预测的多功能backbone
[1] AutoFormer: Searching Transformers for Visual Recognition
paper | code
[3] Rank & Sort Loss for Object Detection and Instance Segmentation(Oral)
paper | code
[2] Focal Frequency Loss for Image Reconstruction and Synthesis
paper | code
[1] Orthogonal Projection Loss
paper | code
[3] A Light Stage on Every Desk
paper | project
[2] Handwriting Transformers
paper
[1] On Generating Transferable Targeted Perturbations
paper | code
[6] Learnable Boundary Guided Adversarial Training
paper | code
[5] Transporting Causal Mechanisms for Unsupervised Domain Adaptation(Oral)
paper
[4] Robustness via Cross-Domain Ensembles(Oral)
paper | code | model | homepage | video
[3] HeadGAN: One-shot Neural Head Synthesis and Editing
paper
[2] Labels4Free: Unsupervised Segmentation using StyleGAN
paper | code | project
[1] EigenGAN: Layer-Wise Eigen-Learning for GANs
paper | code
[2] Accelerating Atmospheric Turbulence Simulation via Learned Phase-to-Space Transform
paper
[1] Equivariant Imaging: Learning Beyond the Range Space(Oral)
paper
[1] Learning for Scale-Arbitrary Super-Resolution from Scale-Specific Networks
paper | code
[2] ALADIN: All Layer Adaptive Instance Normalization for Fine-grained Style Similarity(风格迁移)
paper | [cod]
[1] Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts(字体生成)
paper | code
[3] Human Pose Regression with Residual Log-likelihood Estimation(Oral)
paper | code
[2] PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop(Oral)
paper | code | project
[1] HuMoR: 3D Human Motion Model for Robust Pose Estimation(Oral)
paper | video | project
[1] MonoIndoor: Towards Good Practice of Self-Supervised Monocular Depth Estimation for Indoor Environments
paper
[2] Hand Image Understanding via Deep Multi-Task Learning(手部图像理解)
paper
[1] Cross-Sentence Temporal and Semantic Relations in Video Activity Localisation
paper
[2] Enriching Local and Global Contexts for Temporal Action Localization
paper
[1] Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition
paper | code
[2] Spatio-Temporal Representation Factorization for Video-based Person Re-Identification
paper
[1] TransReID: Transformer-based Object Re-Identification
paper | code
解读:来自Transformer的降维打击:ReID各项任务全面领先,阿里&浙大提出TransReID
[3] Normalization Matters in Weakly Supervised Object Localization
paper
[2] TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization
paper | code
[1] Boundary-sensitive Pre-training for Temporal Localization in Videos
paper
[2] Warp Consistency for Unsupervised Learning of Dense Correspondences(Oral)
paper | code
[1] COTR: Correspondence Transformer for Matching Across Images
paper
[1] MVTN: Multi-View Transformation Network for 3D Shape Recognition
paper
[2] Learning to Adversarially Blur Visual Object Tracking
paper | code
[1] Detecting Invisible People
paper | project | video
[1] Generative Adversarial Registration for Improved Conditional Deformable Templates
paper | code | homepage
[4] Adaptive Boundary Proposal Network for Arbitrary Shape Text Detection
paper
[3] Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition
paper
[2] Text is Text, No Matter What: Unifying Text Recognition using Knowledge Distillation
paper
[1] Towards the Unseen: Iterative Text Recognition by Distilling from Errors
paper
[3] Change is Everywhere Single-Temporal Supervised Object Change Detection for High Spatial Resolution Remote Sensing Imagery(变化检测)
code
[2] Geography-Aware Self-Supervised Learning
paper
[1] Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote Sensing Data(迁移学习)
paper | code
[2] Spatial-Temporal Transformer for Dynamic Scene Graph Generation
paper
[1] Unconstrained Scene Generation with Locally Conditioned Radiance Fields
paper
[1] Generative Compositional Augmentations for Scene Graph Prediction
paper | code
[1] MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks
paper
[1] Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning
paper | code
[1] In-Place Scene Labelling and Understanding with Implicit Scene Representation(Oral)
paper | project
[3] Graph Constrained Data Representation Learning for Human Motion Segmentation(人体运动分割)
paper
[2] Improve Unsupervised Pretraining for Few-label Transfer
paper
[1] Clustering by Maximizing Mutual Information Across Views
paper
[6] Adversarial Unsupervised Domain Adaptation with Conditional and Label Shift: Infer, Align and Iterate
paper
[5] Recursively Conditional Gaussian for Ordinal Unsupervised Domain Adaptation(Oral)
paper
[4] Improve Unsupervised Pretraining for Few-label Transfer
paper
[3] Generalized Source-free Domain Adaptation
homepage | code
[2] Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote Sensing Data(迁移学习)
paper | code
[1] Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling(迁移学习)
paper
[1] Learning with Memory-based Virtual Classes for Deep Metric Learning
paper
[1] Always Be Dreaming: A New Approach for Data-Free Class-Incremental Learning
paper | code | project
[3] Parametric Contrastive Learning
paper | code
[2] Geography-Aware Self-Supervised Learning
paper
[1] CoMatch: Semi-supervised Learning with Contrastive Graph Regularization
paper | code
[1] Active Learning for Deep Object Detection via Probabilistic Modeling
paper
[3] Greedy Gradient Ensemble for Robust Visual Question Answering
paper | code
[2] On the hidden treasure of dialog in video question answering
paper
[1] Just Ask: Learning to Answer Questions from Millions of Narrated Videos(Oral)
paper | code | project
[1] On Exposing the Challenging Long Tail in Future Prediction of Traffic Actors
paper | code
[1] 4DComplete: Non-Rigid Motion Estimation Beyond the Observable Surface(4D重建)
paper | dataset | video
Spatial Uncertainty-Aware Semi-Supervised Crowd Counting(人群计数)
paper
Rethinking Counting and Localization in Crowds: A Purely Point-Based Framework(Oral)(人群计数)
paper | code
Uniformity in Heterogeneity:Diving Deep into Count Interval Partition for Crowd Counting(人群计数)
paper | code
Self-Conditioned Probabilistic Learning of Video Rescaling(视频压缩)
paper
Mixed SIGNals: Sign Language Production via a Mixture of Motion Primitives(手势生成)
paper
Temporal-wise Attention Spiking Neural Networks for Event Streams Classification
paper
Long-Term Temporally Consistent Unpaired Video Translation from Simulated Surgical 3D Data(视频翻译/医学/视频合成)
paper
Pathdreamer: A World Model for Indoor Navigation(视觉导航)
paper
IPOKE: POKING A STILL IMAGE FOR CONTROLLED STOCHASTIC VIDEO SYNTHESIS
paper | code | project
Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis
paper | project
KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs
paper | code
[18] Recursively Conditional Gaussian for Ordinal Unsupervised Domain Adaptation(Oral)
paper
[17] Rethinking Counting and Localization in Crowds: A Purely Point-Based Framework(Oral)(人群计数)
paper | code
[16] Rank & Sort Loss for Object Detection and Instance Segmentation(Oral)
paper | code
[15] Transporting Causal Mechanisms for Unsupervised Domain Adaptation
paper
[14] Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation(Oral)
[paper](https://arxiv.org/abs/2107.11264
[13] Re-distributing Biased Pseudo Labels for Semi-supervised Semantic Segmentation: A Baseline Investigation(Oral)
paper | code
[12] Human Pose Regression with Residual Log-likelihood Estimation(Oral)
paper | code
[11] Robustness via Cross-Domain Ensembles(Oral)
paper | code | model | homepage
[10] Warp Consistency for Unsupervised Learning of Dense Correspondences(Oral)
paper | code
[9] PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop(Oral)
paper | code | project
[8] HuMoR: 3D Human Motion Model for Robust Pose Estimation(Oral)
paper | video | project
[7] Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers(Oral)
paper | code
[6] Equivariant Imaging: Learning Beyond the Range Space(Oral)
paper
[5] MDETR : Modulated Detection for End-to-End Multi-Modal Understanding(Oral)
paper | code | project | colab
[4] Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions(Oral)
paper | code
解读:金字塔视觉Transformer(PVT):用于密集预测的多功能backbone
[3] Mining Latent Classes for Few-shot Segmentation(Oral)
paper | code
[2] In-Place Scene Labelling and Understanding with Implicit Scene Representation(Oral)
paper | project
[1] Just Ask: Learning to Answer Questions from Millions of Narrated Videos(Oral)
paper | code
[2] TransReID: Transformer-based Object Re-Identification
paper | code
解读:来自Transformer的降维打击:ReID各项任务全面领先,阿里&浙大提出TransReID
[1] Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions(Oral)
paper | code
解读:金字塔视觉Transformer(PVT):用于密集预测的多功能backbone