- ETR: An Efficient Transformer for Re-ranking in Visual Place Recognition
- MixVPR: Feature Mixing for Visual Place Recognition
- Panoptic-Aware Image-to-Image Translation
- 图像翻译
- 域到域翻译
- Learning Across Domains and Devices: Style-Driven Source-Free Domain Adaptation in Clustered Federated Learning
⭐code - Federated Learning for Commercial Image Sources
- VL-Taboo: An Analysis of Attribute-based Zero-shot Capabilities of Vision-Language Models
⭐code - Learning by Hallucinating: Vision-Language Pre-training with Weak Supervision
- Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention
⭐code - GAFNet: A Global Fourier Self Attention Based Novel Network for multi-modal ownstream tasks
- Weakly-Supervised Optical Flow Estimation for Time-of-Flight
- Rebalancing Gradient To Improve Self-Supervised Co-Training of Depth, Odometry and Optical Flow Predictions
⭐code - DCVNet: Dilated Cost Volume Networks for Fast Optical Flow
- MFCFlow : A Motion Feature Compensated Multi-Frame Recurrent Network for Optical Flow Estimation
- BrightFlow: Brightness-Change-Aware Unsupervised Learning of Optical Flow
⭐code - Towards Equivariant Optical Flow Estimation with Deep Learning(https://github.com/stsavian/equivariant_of_estimation)
- Learning Lightweight Neural Networks via Channel-Split Recurrent Convolution
- Meta-Learning for Adaptation of Deep Optical Flow Networks
- Searching Efficient Neural Architecture with Multi-resolution Fusion Transformer for Appearance-based Gaze Estimation
- iris localization(虹膜定位)
- 视线跟随
- 视线重定向
- Multi-view Tracking Using Weakly Supervised Human Motion Prediction
⭐code - Anticipative Feature Fusion Transformer for Multi-Modal Action Anticipation
⭐code - GliTr: Glimpse Transformers with Spatiotemporal Consistency for Online Action Prediction
- Back to MLP: A Simple Baseline for Human Motion Prediction
⭐code - Intention-Conditioned Long-Term Human Egocentric Action Anticipation
- 行人轨迹预测
- Grounding Scene Graphs on Natural Images via Visio-Lingual Message Passing
⭐code🏠project - Improving Predicate Representation in Scene Graph Generation by Self-Supervised Learning
- More Knowledge, Less Bias: Unbiasing Scene Graph Generation with Explicit Ontological Adjustment
⭐code - Composite Relationship Fields with Transformers for Scene Graph Generation
- Similarity Contrastive Estimation for Self-Supervised Soft Contrastive Learning
⭐code - Representation Disentanglement in Generative Models with Contrastive Learning
- Addressing Feature Suppression in Unsupervised Visual Representations
- Ego-Vehicle Action Recognition based on Semi-Supervised Contrastive Learning
- Ev-NeRF: Event Based Neural Radiance Field
- DDNeRF: Depth Distribution Neural Radiance Fields
- X-NeRF: Explicit Neural Radiance Field for Multi-Scene 360deg Insufficient RGB-D Views
⭐code - Fast Differentiable Transient Rendering for Non-Line-of-Sight Reconstruction
⭐code - Compressing Explicit Voxel Grid Representations: fast NeRFs become also small
- Control-NeRF: Editable Feature Volumes for Scene Rendering and Manipulation
🏠project - Beyond RGB: Scene-Property Synthesis with Neural Radiance Fields
- 光场
- 相机
- 兴趣点检测
- Rethinking Rotation in Self-Supervised Contrastive Learning: Adaptive Positive or Negative Data Augmentation
- AdvisIL - A Class-Incremental Learning Advisor
⭐code - FeTrIL: Feature Translation for Exemplar-Free Class-Incremental Learning
⭐code - 增量学习
- 全身运动合成
- Asymmetric Student-Teacher Networks for Industrial Anomaly Detection
⭐code - Zero-Shot Versus Many-Shot: Unsupervised Texture Anomaly Detection
⭐code - No Shifted Augmentations (NSA): compact distributions for robust self-supervised Anomaly Detection
- GLAD: A Global-to-Local Anomaly Detector
- 道路异常检测
- 异常聚类
- Line Search-Based Feature Transformation for Fast, Stable, and Tunable Content-Style Control in Photorealistic Style Transfer
⭐code - RAST: Restorable Arbitrary Style Transfer via Multi-Restoration
- Dance Style Transfer with Cross-modal Transformer
📺video - Is Bigger Always Better? An Empirical Study on Efficient Architectures for Style Transfer and Beyond
- AudioViewer: Learning to Visualize Sounds
🏠project - Audio Visual Event Localization视听事件定位
- 音频去噪
- 视听分割
- 生源定位
- 语音识别
- 音频分离
- Efficient Visual Tracking with Exemplar Transformers
⭐code - Hard to Track Objects with Irregular Motions and Similar Appearances?Make It Easier by Buffering the Matching Space
- HOOT: Heavy Occlusions in Object Tracking Benchmark
- VirtualHome Action Genome: A Simulated Spatio-Temporal Scene Graph Dataset With Consistent Relationship Labels
- Tracking Growth and Decay of Plant Roots in Minirhizotron Images
⭐code - Planar Object Tracking via Weighted Optical Flow
- Multi-Frame Attention with Feature-Level Warping for Drone Crowd Tracking
- 多目标跟踪
- AttTrack: Online Deep Attention Transfer for Multi-object Tracking
- Detection Recovery in Online Multi-Object Tracking With Sparse Graph Tracker
⭐code - MMPTRACK: Large-scale Densely Annotated Multi-camera Multiple People Tracking Benchmark
- TransMOT: Spatial-Temporal Graph Transformer for Multiple Object Tracking
- 手指静脉识别
- 隐形眼镜虹膜PAD算法的错误分类
- 生物信息识别
- 虹膜
- DRAMA: Joint Risk Localization and Captioning in Driving
- VLC-BERT: Visual Question Answering with Contextualized Commonsense Knowledge
⭐code - Barlow constrained optimization for Visual Question Answering
⭐code - How To Practice VQA on a Resource-Limited Target Domain
🏠project - Guiding Visual Question Answering With Attention Priors
- VideoQA
- 视觉问题生成
- AR
- Vision Transformer for NeRF-Based View Synthesis From a Single Input Image
- Self-improving Multiplane-to-layer Images for Novel View Synthesis
- Continual Learning with Dependency Preserving Hypernetworks
- Do Pre-trained Models Benefit Equally in Continual Learning
⭐code - Saliency Guided Experience Packing for Replay in Continual Learning
- Switching to Discriminative Image Captioning by Relieving a Bottleneck of Reinforcement Learning
- Wavelength-Aware 2D Convolutions for Hyperspectral Imaging
⭐code - ML-Decoder: Scalable and Versatile Classification Head
- CNN2Graph: Building Graphs for Image Classification
- Token Pooling in Vision Transformers for Image Classification
- Augmentation by Counterfactual Explanation -Fixing an Overconfident Classifier
- Treatment Learning Causal Transformer for Noisy Image Classification
⭐code - 长尾识别
- pen-Set Classification
- 细粒度分类
- 多标签分类
- 小样本分类
- 行人分析
- 行人搜索
- Re-id
- Camera Alignment and Weighted Contrastive Learning for Domain Adaptation in Video Person ReID
⭐code - MEVID: Multi-view Extended Videos with Identities for Video Person Re-Identification
⭐code - Feature Disentanglement Learning with Switching and Aggregation for Video-based Person Re-Identification
- Graph-Based Self-Learning for Robust Person Re-Identification
- Body Part-Based Representation Learning for Occluded Person Re-Identification
- Camera Alignment and Weighted Contrastive Learning for Domain Adaptation in Video Person ReID
- 步态识别
- 步态迁移
- 嫌疑人识别
- 人群计数
- OpenEarthMap: A Benchmark Dataset for Global High-Resolution Land Cover Mapping
🌻dataset - A Continual Deepfake Detection Benchmark: Dataset, Methods, and Essentials
🌻dataset - The CropAndWeed Dataset: a Multi-Modal Learning Approach for Efficient Crop and Weed Manipulation
🌻dataset - IDD-3D: A Dataset for Driving in Unstructured Road Scenes
🌻dataset - Vis2Rec: A Large-Scale Visual Dataset for Visit Recommendation
🌻dataset - Creating a Forensic Database of Shoeprints from Online Shoe-Tread Photos
🌻dataset - 目标检测、分割、跟踪
- 人体图像分析
- 图像字幕
- 视频字幕
- Boosting vision transformers for image retrieval
⭐code - Certified Defense for Content Based Image Retrieval
- Fashion Image Retrieval with Text Feedback by Additive Attention Compositional Learning
- Content-Based Music-Image Retrieval Using Self- and Cross-Modal Feature Embedding Memory
- 图像-句子检索
- 图像-文本检索
- 跨域检索
- 图像-文本匹配
- IDD-3D: Indian Driving Dataset for 3D Unstructured Road Scenes
⭐code - PP4AV: A Benchmarking Dataset for Privacy-Preserving Autonomous Driving
⭐code - Benchmarking Visual Localization for Autonomous Navigation
⭐code - 车辆重识别
- 车道线检测
- 轨迹预测
- 动作识别
- Modality Mixer for Multi-modal Action Recognition
- STAR-Transformer: A Spatio-temporal Cross Attention Transformer for Human Action Recognition
- Holistic Interaction Transformer Network for Action Detection
⭐code - Reconstructing Humpty Dumpty: Multi-feature Graph Autoencoder for Open Set Action Recognition
⭐code - DA-AIM: Exploiting Instance-based Mixed Sampling via Auxiliary Source Domain Supervision for Domain-adaptive Action Detection
⭐code - Spatio-Temporal Action Detection Under Large Motion
⭐code - Efficient Skeleton-Based Action Recognition via Joint-Mapping Strategies
- Recur, Attend or Convolve? On Whether Temporal Modeling Matters for Cross-Domain Robustness in Action Recognition
- Semantics Guided Contrastive Learning of Transformers for Zero-Shot Temporal Activity Detection
- Adaptive Local-Component-Aware Graph Convolutional Network for One-Shot Skeleton-Based Action Recognition
- Multi-View Action Recognition using Contrastive Learning
⭐code - Stop or Forward: Dynamic Layer Skipping for Efficient Action Recognition
- A Simple and Efficient Pipeline to Build an End-to-End Spatial-Temporal Action Detector
- 时序动作定位
- PointNeuron: 3D Neuron Reconstruction via Geometry and Topology Learning of Point Clouds
- Visualizing Global Explanations of Point Cloud DNNs
⭐code - RSF: Optimizing Rigid Scene Flow From 3D Point Clouds Without Labels
- Nearest Neighbors Meet Deep Neural Networks for Point Cloud Analysis
- Explainability-Aware One Point Attack for Point Cloud Neural Networks
⭐code - Centroid Distance Keypoint Detector for Colored Point Clouds
⭐code - 点云分类
- 点云分割
- 点云配准
- 点云重建
- 3D点云
- EmbryosFormer: Deformable Transformer and Collaborative Encoding-Decoding for Embryos Stage Development Classification
⭐code - Delving into Masked Autoencoders for Multi-Label Thorax Disease Classification
⭐code - Accumulated Trivial Attention Matters in Vision Transformers on Small Datasets
⭐code - Couplformer: Rethinking Vision Transformer With Coupling Attention
- Trans4Map: Revisiting Holistic Bird's-Eye-View Mapping From Egocentric Images to Allocentric Semantics With Vision Transformers
⭐code - PatchDropout: Economizing Vision Transformers Using Patch Dropout
⭐code - OutfitTransformer: Learning Outfit Representations for Fashion Recommendation
- Discrete Cosin TransFormer: Image Modeling From Frequency Domain
- Orthogonal Transforms For Learning Invariant Representations In Equivariant Neural Networks
- 剪枝
- 知识蒸馏
- Collaborative Multi-Teacher Knowledge Distillation for Learning Low Bit-width Deep Neural Networks
- Understanding the Role of Mixup in Knowledge Distillation: \An Empirical Study
⭐code - Understanding the Role of Mixup in Knowledge Distillation:An Empirical Study
⭐code - TinyHD: Efficient Video Saliency Prediction with Heterogeneous Decoders using Hierarchical Maps Distillation
- [Online Knowledge Distillation for Multi-task Learning]](https://openaccess.thecvf.com/content/WACV2023/papers/Jacob_Online_Knowledge_Distillation_for_Multi-Task_Learning_WACV_2023_paper.pdf)
- Adversarial local distribution regularization for knowledge distillation
- 自我蒸馏
- DC
- 量化
- 轻量级
- Revisiting Training-free NAS Metrics: An Efficient Training-based Method
⭐code - SVD-NAS: Coupling Low-Rank Approximation and Neural Architecture Search
⭐code - FreeREA: Training-Free Evolution-based Architecture Search
⭐code - Toward Edge-Efficient Dense Predictions with Synergistic Multi-Task Neural Architecture Search
- OCR-VQGAN: Taming Text-within-Image Generation
⭐code - Efficient few-shot learning for pixel-precise handwritten document layout analysis
- D-Extract: Extracting Dimensional Attributes From Product Images
⭐code - 文本识别
- 表格检测
- LOGO检测
- 文档检测
- 文档理解
- 文本擦除
- Single Image Super-Resolution via a Dual Interactive Implicit Neural Network
- HIME: Efficient Headshot Image Super-Resolution with Multiple Exemplars
- Deep Model-Based Super-Resolution With Non-Uniform Blur
⭐code - Kernel-Aware Burst Blind Super-Resolution
- Enriched CNN-Transformer Feature Aggregation Networks for Super-Resolution
⭐code - Joint Video Rolling Shutter Correction and Super-Resolution
- 视频超分辨率
- One-Shot Synthesis of Images and Segmentation Masks
⭐code - Style-Guided Inference of Transformer for High-resolution Image Synthesis
- Evaluating Generative Networks Using Gaussian Mixtures of Image Features
- More Control for Free! Image Synthesis with Semantic Diffusion Guidance
- 图像生成
- 文本-图像合成
- 文字引导的图像操作
- 自监督
- Self-Supervised Pyramid Representation Learning for Multi-Label Visual Analysis and Beyond
⭐code - Global-Local Self-Distillation for Visual Representation Learning
⭐code - Accelerating Self-Supervised Learning via Efficient Training Strategies
- FUSSL: Fuzzy Uncertain Self Supervised Learning
- Self-Supervised Correspondence Estimation via Multiview Registration
🏠project - Similarity Contrastive Estimation for Image and Video Soft Contrastive Self-Supervised Learning
- Self-Supervised Relative Pose With Homography Model-Fitting in the Loop
- Self-Distilled Self-supervised Representation Learning
⭐code - Multi-Level Contrastive Learning for Self-Supervised Vision Transformers
- Self-Supervised Distilled Learning for Multi-modal Misinformation Identification
- An Embedding-Dynamic Approach to Self-Supervised Learning
- Self-Supervised Pyramid Representation Learning for Multi-Label Visual Analysis and Beyond
- 半监督
- Class-Level Confidence Based 3D Semi-Supervised Learning
- Dynamic Re-Weighting for Long-Tailed Semi-Supervised Learning
- Unifying Distribution Alignment as a Loss for Imbalanced Semi-supervised Learning
⭐code - Semantics-Depth-Symbiosis: Deeply Coupled Semi-Supervised Learning of Semantics and Depth
- Semi-Supervised Learning for Sparsely-Labeled Sequential Data:Application to Healthcare Video Processing
- 无监督
- Image Segmentation-based Unsupervised Multiple Objects Discovery
- WSNet: Towards An Effective Method for Wound Image Segmentation
⭐code - Autoencoder-based background reconstruction and foreground segmentation with background noise estimation
- Unsupervised multi-object segmentation using attention and soft-argmax
- 语义分割
- Attribution-aware Weight Transfer: A Warm-Start Initialization for Class-Incremental Semantic Segmentation
⭐code - Full Contextual Attention for Multi-resolution Transformers in Semantic Segmentation
- Urban Scene Semantic Segmentation with Low-Cost Coarse Annotation
- LoopDA: Constructing Self-loops to Adapt Nighttime Semantic Segmentation
⭐code - Empirical Generalization Study: Unsupervised Domain Adaptation vs. Domain Generalization Methods for Semantic Segmentation in the Wild
- Semantic Segmentation with Active Semi-Supervised Learning
- Self-supervised Learning with Local Contrastive Loss for Detection and Semantic Segmentation
- Semantic Segmentation of Degraded Images Using Layer-Wise Feature Adjustor
- Reducing Annotation Effort by Identifying and Labeling Contextually Diverse Classes for Semantic Segmentation Under Domain Shift
⭐code - Cooperative Self-Training for Multi-Target Adaptive Semantic Segmentation
⭐code - Refign: Align and Refine for Adaptation of Semantic Segmentation to Adverse Conditions
⭐code - BEVSegFormer: Bird's Eye View Semantic Segmentation From Arbitrary Camera Rigs
- ProtoSeg: Interpretable Semantic Segmentation with Prototypical Parts
⭐code - Complementary Bi-directional Feature Compression for Indoor 360° Semantic Segmentation with Self-distillation
- Automated Detection of Label Errors in Semantic Segmentation Datasets via Deep earning and Uncertainty Quantification
- 弱监督语义分割
- ingle Stage Weakly Supervised Semantic Segmentation of Complex Scenes](https://openaccess.thecvf.com/content/WACV2023/papers/Akiva_Single_Stage_Weakly_Supervised_Semantic_Segmentation_of_Complex_Scenes_WACV_2023_paper.pdf)
- 半监督语义分割
- Multi-class part parsing
- Attribution-aware Weight Transfer: A Warm-Start Initialization for Class-Incremental Semantic Segmentation
- BEV segmentation
- 全景分割
- 实例分割
- From Forks to Forceps: A New Framework for Instance Segmentation of Surgical Instruments
- CellTranspose: Few-shot Domain Adaptation for Cellular Instance Segmentation
- Weakly Supervised Cell-Instance Segmentation With Two Types of Weak Labels by Single Instance Pasting
- Self-Supervised Learning With Masked Image Modeling for Teeth Numbering, Detection of Dental Restorations, and Instance Segmentation in Dental Panoramic Radiographs
⭐code - Weakly-Supervised Point Cloud Instance Segmentation With Geometric Priors
- NeuralBF: Neural Bilateral Filtering for Top-down Instance Segmentation on Point Clouds
🏠project - SCTS: Instance Segmentation of Single Cells Using a Transformer-Based Semantic-Aware Model and Space-Filling Augmentation
- 小样本分割
- 叶子疾病分割
- 细胞分割
- 目标分割
- 抠图
- 域适应
- Self-Distillation for Unsupervised 3D Domain Adaptation
🏠project - CoNMix for Source-free Single and Multi-target Domain Adaptation
⭐code🏠project - Learning Classifiers of Prototypes and Reciprocal Points for Universal Domain Adaptation
- Select, Label, and Mix: Learning Discriminative Invariant Feature Representations for Partial Domain Adaptation
🏠project - TVT: Transferable Vision Transformer for Unsupervised Domain Adaptation
⭐code - Backprop Induced Feature Weighting for Adversarial Domain Adaptation with Iterative Label Distribution Alignment
- Self-Distillation for Unsupervised 3D Domain Adaptation
- Generative Alignment of Posterior Probabilities for Source-free Domain Adaptation
- 域泛化
- Intra-Source Style Augmentation for Improved Domain Generalization
- Center-aware Adversarial Augmentation for Single Domain Generalization
- FFM: Injecting Out-of-Domain Knowledge via Factorized Frequency Modification
- Improving Diversity with Adversarially Learned Transformations for Domain Generalization
- 零样本
- 小样本
- Aggregating Bilateral Attention for Few-Shot Instance Localization
- HyperShot: Few-Shot Learning by Kernel HyperNetworks
⭐code - Few-Shot Learning of Compact Models via Task-Specific Meta Distillation
- Semantic Guided Latent Parts Embedding for Few-Shot Learning
⭐code - Self-Attention Message Passing for Contrastive Few-Shot Learning
- My Face My Choice: Privacy Enhancing Deepfakes for Social Media Anonymization
- Improving Deep Facial Phenotyping for Ultra-rare Disorder Verification Using Model Ensembles
⭐code - 读唇术
- 3D人脸
- 人脸识别
- DigiFace-1M: 1 Million Digital Face Images for Face Recognition
⭐code - CAST: Conditional Attribute Subsampling Toolkit for Fine-Grained Evaluation
⭐code - CYBORG: Blending Human Saliency Into the Loss Improves Deep Learning-Based Synthetic Face Detection
- Unifying Margin-Based Softmax Losses in Face Recognition
- Harnessing Unrecognizable Faces for Improving Face Recognition
- QMagFace: Simple and Accurate Quality-Aware Face Recognition
⭐code - A Quality Aware Sample-to-Sample Comparison for Face Recognition
- DigiFace-1M: 1 Million Digital Face Images for Face Recognition
- 人脸修复/恢复
- 人脸交换
- 人脸表情识别
- 人脸重现
- 人脸命名
- 人脸重建
- 人脸合成
- Deepfake
- Facial Action Unit Detection
- 人脸质量评估
- 活体检测
- Domain Invariant Vision Transformer Learning for Face Anti-Spoofing
- 基于表情的脸部皱纹合成
- 文字和图像引导的3D头像生成
- 说话人脸
- 唇语阅读
- Leveraging Local Patch Differences in Multi-Object Scenes for Generative Adversarial Attacks
- Inducing Data Amplification Using Auxiliary Datasets in Adversarial Training
⭐code - Interpreting Disparate Privacy-Utility Tradeoff in Adversarial Learning via Attribute Correlation
- FLOAT: Fast Learnable Once-for-All Adversarial Training for Tunable Trade-off between Accuracy and Robustness
- Adversarial robustness in discontinuous spaces via alternating sampling & descent
- PatchZero: Defending against Adversarial Patch Attacks by Detecting and Zeroing the Patch
- Avoiding Lingering in Learning Active Recognition by Adversarial Disturbance
- 对抗样本
- 主动攻击
- RS
- 变化检测
- 航空图像检测
- 航空图像分割
- 国际边界检测
- 图像质量评估
- 图像恢复
- 图像修复
- 图像增强
- Perceptual Image Enhancement for Smartphone Real-Time Applications
⭐code - Robust Real-World Image Enhancement Based on Multi-Exposure LDR Images
- End-to-End Single-Frame Image Signal Processing for High Dynamic Range Scenes
- PSENet: Progressive Self-Enhancement Network for Unsupervised Extreme-Light Image Enhancement
- Perceptual Image Enhancement for Smartphone Real-Time Applications
- 图像着色
- Guiding Users to Where to Give Color Hints for Efficient Interactive Sketch Colorization via Unsupervised Region Prioritization
- Generative Colorization of Structured Mobile Web Pages
⭐code - iColoriT: Towards Propagating Local Hints to the Right Region in Interactive Colorization by Leveraging Vision Transformer
🏠project - Pik-Fix: Restoring and Colorizing Old Photos
- 图像补全
- 图像重新缩放
- HDR重构
- 去噪
- 去雾
- 去反射
- De-fencing
- Deconvolution
- 阴影消除
- Kinematic-aware Hierarchical Attention Network for Human Pose Estimation in Videos
⭐code - HuPR: A Benchmark for Human Pose Estimation Using Millimeter Wave Radar
⭐code - Computer Vision to the Rescue: Infant Postural Symmetry Estimation from Incongruent Annotations
⭐code - 多人姿态估计
- 三维人体
- Placing Human Animations into 3D Scenes by Learning Interaction- and Geometry-Driven Keyframes
- Uplift and Upsample: Efficient 3D Human Pose Estimation with Uplifting Transformers
⭐code - Learning 3D Human Pose Estimation from Dozens of Datasets using a Geometry-Aware Autoencoder to Bridge Between Skeleton Formats
🏠project - GarSim: Particle Based Neural Garment Simulator
- Learnable Human Mesh Triangulation for 3D Human Pose and Shape Estimation
- ElliPose: Stereoscopic 3D Human Pose Estimation by Fitting Ellipsoids
- Rethinking the Data Annotation Process for Multi-view 3D Pose Estimation with Active Learning and Self-Training
- CameraPose: Weakly-Supervised Monocular 3D Human Pose Estimation by Leveraging In-the-wild 2D Annotations
- 手部姿势
- 3D手
- 手部重建
- 手-物体姿势估计
- A Deep Neural Framework to Detect Individual Advertisement (Ad) from Videos
- TCAM: Temporal Class Activation Maps for Object Localization in Weakly-Labeled Unconstrained Videos
⭐code - Recipe2Video: Synthesizing Personalized Videos from Recipe Texts
- 视频增强
- 视频理解
- 视频摘要
- 多人检测
- 场景识别
- Video Grounding
- 视频异常检测(VAD)
- DyAnNet: A Scene Dynamicity Guided Self-Trained Video Anomaly Detection Network
- Cross-Domain Video Anomaly Detection without Target Domain Adaptation
- Bi-Directional Frame Interpolation for Unsupervised Video Anomaly Detection
- Towards Interpretable Video Anomaly Detection
- Normality Guided Multiple Instance Learning for Weakly Supervised Video Anomaly Detection
- 图像视频编解码
- Universal Deep Image Compression via Content-Adaptive Optimization with Adapters
⭐code - A neural video codec with spatial rate-distortion control
- Boosting Neural Video Codecs by Exploiting Hierarchical Redundancy
- Neural Distributed Image Compression with Cross-Attention Feature Alignment
⭐code - Lossy Image Compression with Quantized Hierarchical VAEs
- Universal Deep Image Compression via Content-Adaptive Optimization with Adapters
- 视频人像合成
- 视频帧插值
- 视频运动重定位
- 视频运动放大
- 视频稳定
- 视频分类
- 视频分割
- 视频伪造检测
- 视频跟踪
- ConfMix: Unsupervised Domain Adaptation for Object Detection via Confidence-based Mixing
⭐code - The Box Size Confidence Bias Harms Your Object Detector
⭐code - Resolving Class Imbalance for LiDAR-based Object Detector by Dynamic Weight Average and Contextual Ground Truth Sampling
- Is Your Noise Correction Noisy? PLS: Robustness To Label Noise With Two Stage Detection
⭐code - Phantom Sponges: Exploiting Non-Maximum Suppression to Attack Deep Object Detectors
- Domain Adaptive Object Detection for Autonomous Driving under Foggy Weather
⭐code - ROMA: Run-Time Object Detection To Maximize Real-Time Accuracy
- Towards Online Domain Adaptive Object Detection
⭐code - MT-DETR: Robust End-to-end Multimodal Detection with Confidence Fusion
⭐code - Towards Few-Annotation Learning for Object Detection:Are Transformer-based Models More Efficient ?
- Scaling Novel Object Detection with Weakly Supervised Detection Transformers
- Mobile Robot Manipulation using Pure Object Detection
⭐code - Domain Adaptation Using Self-Training With Mixup for One-Stage Object Detection
- Gradient-Based Quantification of Epistemic Uncertainty for Deep Object Detectors
⭐code - 小样本目标检测
- 弱监督目标检测
- 3D目标检测
- TransPillars: Coarse-To-Fine Aggregation for Multi-Frame 3D Object Detection
- Adaptive Feature Fusion for Cooperative Perception Using LiDAR Point Clouds
- ImpDet: Exploring Implicit Fields for 3D Object Detection
- Li3DeTr: A LiDAR based 3D Detection Transformer
- Far3Det: Towards Far-Field 3D Detection
- Dense Voxel Fusion for 3D Object Detection
- MonoEdge: Monocular 3D Object Detection Using Local Perspectives
- Multivariate Probabilistic Monocular 3D Object Detection
⭐code - SAILOR: Scaling Anchors via Insights into Latent Object Representation
- Out-of-distribution Detection via Frequency-regularized Generative Models
⭐code - Heatmap-based Out-of-Distribution Detection
⭐code - Out-of-Distribution Detection with Reconstruction Error and Typicality-based Penalty
- Mixture Outlier Exposure: Towards Out-of-Distribution Detection in Fine-grained Environments
⭐code - Hyperdimensional Feature Fusion for Out-of-Distribution Detection
⭐code - Task Agnostic and Post-hoc Unseen Distribution Detection
- Out-of-distribution Detection via Frequency-regularized Generative Models
- 伪装目标检测
- 目标发现
- 变化检测
- 用于穿行式安检系统的三维雷达图像的实时隐蔽武器检测
- 图像识别
- 入侵物种检测
- 用于红外图像中的海洋涡流检测
- HoechstGAN: Virtual Lymphocyte Staining Using Generative Adversarial Networks
- Image Completion with Heterogeneously Filtered Spectral Hints
⭐code - Indirect Adversarial Losses via an Intermediate Distribution for Training GANs
- SLI-pSp: Injecting Multi-Scale Spatial Layout in pSp
- Multi-scale Contrastive Learning for Complex Scene Generation
- Realistic Full-Body Anonymization with Surface-Guided GANs
- Fantastic Style Channels and Where to Find Them:A Submodular Framework for Discovering Diverse Directions in GANs
- 3D GAN Inversion with Pose Optimization
🏠project - UVCGAN: UNet Vision Transformer cycle-consistent GAN for unpaired image-to-image translation
⭐code - SketchInverter: Multi-Class Sketch-Based Image Generation via GAN Inversion
- 风格编辑
- fashion attribute editing(时尚属性编辑)
- 匿名化
- 指纹生成
- 开集识别
- Generative Range Imaging for Learning Scene Priors of 3D LiDAR Data
⭐code - Seg&Struct: The Interplay Between Part Segmentation and Structure Inference for 3D Shape Parsing
- Surface normal estimation from optimized and distributed light sources using DNN-based photometric stereo
- Meta-Auxiliary Learning for Future Depth Prediction in Videos
- 3D Neural Sculpting (3DNS): Editing Neural Signed Distance Functions
- Improving the Robustness of Point Convolution on k-Nearest Neighbor Neighborhoods with a Viewpoint-Invariant Coordinate Transform
- CountNet3D: A 3D Computer Vision Approach to Infer Counts of Occluded Objects
- Learning Graph Variational Autoencoders with Constraints and Structured Priors for Conditional Indoor 3D Scene Generation
- 三维重建
- 表面重建
- 深度估计
- Frequency-Aware Self-Supervised Monocular Depth Estimation
⭐code - Attention Attention Everywhere: Monocular Depth Prediction with Skip Attention
⭐code - High-Resolution Depth Estimation for 360-degree Panoramas through Perspective and Panoramic Depth Images Registration
- Improving Pixel-Level Contrastive Learning by Leveraging Exogenous Depth Information
- Temporally Consistent Online Depth Estimation in Dynamic Scenes
⭐code - Self-Supervised Monocular Depth Estimation: Solving the Edge-Fattening Problem
⭐code - High-Resolution Depth Estimation for 360◦ Panoramas through Perspective and Panoramic Depth Images Registration
- Self-supervised Monocular Depth Estimation from Thermal Images via Adversarial Multi-spectral Adaptation
- Frequency-Aware Self-Supervised Monocular Depth Estimation
- 深度补全
- Multi-View Photometric Stereo Revisited
- DELS-MVS: Deep Epipolar Line Search for Multi-View Stereo
- nLMVS-Net: Deep Non-Lambertian Multi-View Stereo
🏠project - 360MVSNet: Deep Multi-view Stereo Network with 360◦ Images for Indoor Scene Reconstruction
- Improving the Pair Selection and the Model Fusion Steps of Satellite Multi-View Stereo Pipelines
- RGB-D重建
- Stereo Matching
- 神经辐射场
- 三维定位
- DBCE: A Saliency Method for Medical Deep Learning Through Anatomically-Consistent Free-Form Deformations
- Representation Recovering for Self-Supervised Pre-training on Medical Images
- Magnification Prior: A Self-Supervised Method for Learning Representations on Breast Cancer Histopathological Images
⭐code - A Morphology Focused Diffusion Probabilistic Model for Synthesis of Histopathology Images
- 3D医学影像分析
- 胸部X光分类
- CT图像融合
- 医学图像定位
- 医学图像分割
- Few-shot Medical Image Segmentation with Cycle-resemblance Attention
- Medical Image Segmentation via Cascaded Attention Decoding
- HiFormer: Hierarchical Multi-scale Representations Using Transformers for Medical Image Segmentation
- Training Auxiliary Prototypical Classifiers for Explainable Anomaly Detection in Medical Image Segmentation
- The Fully Convolutional Transformer for Medical Image Segmentation
⭐code - 病变分割
- 医学图像分类
- 医学图像超分辨率
- 心血管检测
- 远程心率估计
- CT重建
- 黑色素细胞检测
- Instance-Dependent Noisy Label Learning via Graphical Modelling
- Color Recommendation for Vector Graphic Documents based on Multi-Palette Representation
- TeST: Test-time Self-Training under Distribution Shift
- Simultaneous Acquisition of High Quality RGB Image and Polarization Information using a Sparse Polarization Sensor
⭐code - Exemplar Guided Deep Neural Network for Spatial Transcriptomics Analysis of Gene Expression Prediction
- Enabling ISP-less Low-Power Computer Vision
- AdaNorm: Adaptive Gradient Norm Correction based Optimizer for CNNs
⭐code - Composite Learning for Robust and Effective Dense Predictions
- SAILOR: Scaling Anchors via Insights into Latent Object
⭐code - Modeling the Lighting in Scenes as Style for Auto White-Balance Correction
⭐code - DE-CROP: Data-efficient Certified Robustness for Pretrained Classifiers
🏠project - Anisotropic Multi-Scale Graph Convolutional Network for Dense Shape Correspondence
- ATCON: Attention Consistency for Vision Models
⭐code - LAVA: Label-efficient Visual Learning and Adaptation
- Interpolated SelectionConv for Spherical Images and Surfaces
- Augmentation by Counterfactual Explanation -- Fixing an Overconfident Classifier
- Weakly Supervised Annotations for Multi-modal Greeting Cards Dataset
- Multimodal Vision Transformers with Forced Attention for Behavior Analysis
⭐code - Compact and Optimal Deep Learning with Recurrent Parameter Generators
- Motif Mining: Finding and Summarizing Remixed Image Content
- LINEEX: Data Extraction from Scientific Line Charts
⭐code - Neural Implicit Representations for Physical Parameter Inference From a Single Video
🏠project - Physically Plausible Animation of Human Upper Body from a Single Image
- Partially Calibrated Semi-Generalized Pose From Hybrid Point Correspondences
- Learning How to MIMIC: Using Model Explanations To Guide Deep Learning Training
⭐code - Robust and Efficient Alignment of Calcium Imaging Data through Simultaneous Low Rank and Sparse Decomposition
- Improving Multi-Fidelity Optimization With a Recurring Learning Rate for Hyperparameter Tuning
- What can we Learn by Predicting Accuracy?
- Enabling ISPless Low-Power Computer Vision
⭐code - Jointly Learning Band Selection and Filter Array Design for Hyperspectral Imaging
- LCS: Learning Compressible Subspaces for Efficient, Adaptive, Real-Time Network Compression at Inference Time
⭐code - Self-Attentive Pooling for Efficient Deep Learning
⭐code - Fine-Grained Activities of People Worldwide
🏠project - Relaxing Contrastiveness in Multimodal Representation Learning
- Spike-Based Anytime Perception
- Towards Disturbance-Free Visual Mobile Manipulation
🏠project - SERF: Towards Better Training of Deep Neural Networks Using Log-Softplus ERror Activation Function
- RADIANT: Better rPPG Estimation Using Signal Embeddings and Transformer
⭐code - Dataset Condensation With Distribution Matching
- HyperPosePDF - Hypernetworks Predicting the Probability Distribution on SO(3)
- RANCER: Non-Axis Aligned Anisotropic Certification with Randomized Smoothing
- Match Cutting: Finding Cuts with Smooth Visual Transitions
- SIRA: Relightable Avatars from a Single Image
- Are Straight-Through gradients and Soft-Thresholding all you need for Sparse Training?
⭐code - Patch-based Privacy Preserving Neural Network for Vision Tasks
- Adaptive Sample Selection for Robust Learning under Label Noise
⭐code - Concept Correlation and Its Effects on Concept-Based Models
- Improving Saliency Models' Predictions of the Next Fixation With Humans' Intrinsic Cost of Gaze Shifts
- Mapping DNN Embedding Manifolds for Network Generalization Prediction
- GEMS: Generating Efficient Meta-Subnets
- Learning incoherent light emission steering from metasurfaces using generative models
⭐code - EfficientPhys: Enabling Simple, Fast and Accurate Camera-Based Cardiac Measurement
- Performance comparison of DVS data spatial downscaling methods using Spiking Neural Networks
⭐code - Encouraging Disentangled and Convex Representation with Controllable Interpolation Regularization
- A Protocol for Evaluating Model Interpretation Methods from Visual Explanations
- Learning Latent Structural Relations with Message Passing Prior
- Bootstrapping the Relationship Between Images and Their Clean and Noisy Labels
⭐code - ImPosing: Implicit Pose Encoding for Efficient Visual Localization
- GEMS: Scene Expansion using Generative Models of Graphs
- SONGs: Self-Organizing Neural Graphs
⭐code - Context-empowered Visual Attention Prediction in Pedestrian Scenarios
- Exploiting Long-Term Dependencies for Generating Dynamic Scene Graphs
- 图像配准
- 视觉重建