Egocentric Action Understanding (EAU) aims at understanding human actions based on videos shot by first-person cameras.
In this reprository, interetsting papers in EAU are collected to show the development of the EAU community.
💥 NEWS: ICLR-2025 papers are added to the list.
- Exocentric To Egocentric Transfer For Action Recognition: A Short Survey (ArXiv 2024) [Paper]
- A Survey on 3D Egocentric Human Pose Estimation (CVPRW 2024) [Paper]
- An Outlook into the Future of Egocentric Vision (IJCV 2024) [Paper] [Citations]
- Acquisition through My Eyes and Steps: A Joint Predictive Agent Model in Egocentric Worlds (ArXiv 2025) [Paper]
- PRVQL: Progressive Knowledge-guided Refinement for Robust Egocentric Visual Query Localization (ArXiv 2025) [Paper]
- EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering (ArXiv 2025) [Paper] [Code]
- Acquisition through My Eyes and Steps: A Joint Predictive Agent Model in Egocentric Worlds (ArXiv 2025) [Paper]
- HD-EPIC: A Highly-Detailed Egocentric Video Dataset (ArXiv 2025) [Paper] [Project]
- Hier-EgoPack: Hierarchical Egocentric Video Understanding with Diverse Task Perspectives (ArXiv 2025) [Paper] [Project] [Code]
- EgoMe: Follow Me via Egocentric View in Real World (ArXiv 2025) [Paper]
- X-LeBench: A Benchmark for Extremely Long Egocentric Video Understanding (ArXiv 2025) [Paper]
- From My View to Yours: Ego-Augmented Learning in Large Vision Language Models for Understanding Exocentric Daily Living Activities (ArXiv 2025) [Paper]
- HaWoR: World-Space Hand Motion Reconstruction from Egocentric Videos (ArXiv 2025) [Paper] [Project]
- MM-Ego: Towards Building Egocentric Multimodal LLMs (ICLR 2025) [Paper]
- X-GEN: EGO-CENTRIC VIDEO PREDICTION BY WATCHING EXO-CENTRIC VIDEOS (ICLR 2025) [Paper]
- MODELING FINE-GRAINED HAND-OBJECT DYNAMICS FOR EGOCENTRIC VIDEO REPRESENTATION LEARNING (ICLR 2025) [Paper]
- EGOSIM: EGOCENTRIC EXPLORATION IN VIRTUAL WORLDS WITH MULTI-MODAL CONDITIONING (ICLR 2025) [Paper] [Project]
- Do Egocentric Video-Language Models Really Understand Hand-Object Interactions? (ICLR 2025) [Paper] [Code]
- EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting (3DV 2025) [Paper]
Yearly Key words: Ego-LLM🔥, Ego-Motion🔥, New Dataset, 3D, Ego-Exo, Multi-Modality, HOI, Mistake Detection, Video Generation
- Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model (ArXiv 2024) [Paper] [Project]
- EgoCast: Forecasting Egocentric Human Pose in the Wild (ArXiv 2024) [Paper]
- Which Viewpoint Shows it Best? Language for Weakly Supervising View Selection in Multi-view Videos (ArXiv 2024) [Paper]
- EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation (ArXiv 2024) [Paper] [Project] [Code]
- VidEgoThink: ASSESSING EGOCENTRIC VIDEO UNDERSTANDING CAPABILITIES FOR EMBODIED AI (ArXiv 2024) [Paper]
- EgoOops: A Dataset for Mistake Action Detection from Egocentric Videos with Procedural Texts (ArXiv 2024) [Paper] [Project]
- EGOLM: Multi-modal Language Model of Egocentric Motions (ArXiv 2024) [Paper] [Project]
- EgoAvatar: Egocentric View-Driven and Photorealistic Full-body Avatars (ArXiv) [Paper]
- Estimating Body and Hand Motion in an Ego-sensed World (ArXiv 2024) [Paper] [Project] [Code]
- HMD2: Environment-aware Motion Generation from Single Egocentric Head-Mounted Device (ArXiv 2024)
- EMHI: A Multimodal Egocentric Human Motion Dataset with HMD and Body-Worn IMUs (ArXiv 2024) [Paper]
- Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos (ArXiv 2024) [Paper] [Project] [Code]
- Unlocking Exocentric Video-Language Data for Egocentric Video Representation Learning (ArXiv 2024) [Paper]
- HUP-3D: A 3D multi-view synthetic dataset for assisted-egocentric hand-ultrasound pose estimation (ArXiv 2024) [Paper]
- PARSE-Ego4D: Personal Action Recommendation Suggestions for Egocentric Videos (ArXiv 2024) [Paper] [Project]
- Egocentric Vision Language Planning (ArXiv) [Paper]
- Diff-IP2D: Diffusion-Based Hand-Object Interaction Prediction on Egocentric Videos (ArXiv 2024) [Paper]
- Spatial Cognition from Egocentric Video: Out of Sight, Not Out of Mind (ArXiv 2024) [Paper] [Project]
- EFM3D: A Benchmark for Measuring Progress Towards 3D Egocentric Foundation Models (ArXiv) [Paper]
- HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision (ArXiv 2024) [Paper] [Code]
- Intention-driven Ego-to-Exo Video Generation (ArXiv 2024) [Paper]
- Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos (NeurIPS 2024) [Paper] [Code]
- Exocentric-to-Egocentric Video Generation (NeurIPS 2024) [Paper] [Code]
- HENASY: Learning to Assemble Scene-Entities for Interpretable Egocentric Video-Language Model (NeurIPS 2024) [Paper]
- EgoSim: An Egocentric Multi-view Simulator for Body-worn Cameras during Human Motion (NeurIPS 2024)
- Estimating Ego-Body Pose from Doubly Sparse Egocentric Video Data (NeurIPS 2024) [Paper]
- EgoChoir: Capturing 3D Human-Object Interaction Regions from Egocentric Views (NeurIPS 2024) [Paper] [Code] [Project]
- Ego3DT: Tracking All 3D Objects in Ego-Centric Video of Daily Activities (ACMMM 2024)
- 4Diff: 3D-Aware Diffusion Model for Third-to-First Viewpoint Translation (ECCV 2024) [Paper] [Project]
- Multimodal Cross-Domain Few-Shot Learning for Egocentric Action Recognition (ECCV 2024) [Paper] [Project] [Code]
- Masked Video and Body-worn IMU Autoencoder for Egocentric Action Recognition (ECCV 2024) [Paper]
- AMEGO: Active Memory from long EGOcentric videos (ECCV 2024) [Paper] [Project] [Code]
- Are Synthetic Data Useful for Egocentric Hand-Object Interaction Detection? (ECCV 2024) [Paper] [Project] [Code]
- ActionVOS: Actions as Prompts for Video Object Segmentation (ECCV 2024) [Paper] [Code]
- Discovering Novel Actions from Open World Egocentric Videos with Object-Grounded Visual Commonsense Reasoning (ECCV 2024) [Paper]
- Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos (ECCV 2024) [Paper] [Project]
- Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation (ECCV 2024) [Paper]
- Spherical World-Locking for Audio-Visual Localization in Egocentric Videos (ECCV 2024) [Paper] [Project]
- LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning (ECCV 2024) [Paper] [Project] [Code]
- Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects (ECCV 2024) [Paper]
- EgoPoser: Robust Real-Time Ego-Body Pose Estimation in Large Scenes (ECCV 2024) [Paper]
- EgoPoseFormer: A Simple Baseline for Egocentric 3D Human Pose Estimation (ECCV 2024) [Paper]
- EgoBodu3M: Egocentric Body Tracking on a VR Headset using a Diverse Dataset (ECCV 2024) [Paper]
- 3D Hand Pose Estimation in Everyday Egocentric Images (ECCV 2024) [Paper] [Project]
- Put Myself in Your Shoes: Lifting the Egocentric Perspective from Exocentric Videos (ECCV 2024) [Paper] [Citations]
- EgoLifter: Open-world 3D Segmentation for Egocentric Perception (ECCV 2024) [Project] [Paper]
- Nymeria: A Massive Collection of Multimodal Egocentric Daily Motion in the Wild (ECCV 2024) [Paper] [Project]
- EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action Understanding (ECCV 2024) [Paper] [Code]
- Ex2Eg-MAE: A Framework for Adaptation of Exocentric Video Masked Autoencoders for Egocentric Social Role Understanding (ECCV 2024) [Paper]
- Synchronization is All You Need: Exocentric-to-Egocentric Transfer for Temporal Action Segmentation with Unlabeled Synchronized Video Pairs (ECCV 2024) [Paper] [Code]
- EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval (ECCV 2024) [Paper] [Code]
- SimpleEgo: Predicting Probabilistic Body Pose from Egocentric Cameras (3DV 2024) [Paper]
- Multi-Factor Adaptive Vision Selection for Egocentric Video Question Answering (ICML 2024) [Paper] [Code]
- Grounded Question-Answering in Long Egocentric Videos (CVPR 2024) [Paper]
- Learning to Segment Referred Objects from Narrated Egocentric Videos (CVPR 2024) [Paper]
- Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives (CVPR 2024) [Project] [Paper] [Citations]
- EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World (CVPR 2024) [Paper] [Code]
- EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models (CVPR 2024) [Paper]
- The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective (CVPR 2024) [Paper]
- Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos (CVPR 2024) [Paper]
- SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos (CVPR 2024)
- PREGO: online mistake detection in PRocedural EGOcentric videos (CVPR 2024) [Paper] [Code]
- Error Detection in Egocentric Procedural Task Videos (CVPR 2024) [Paper] [Code]
- 3D Human Pose Perception from Egocentric Stereo Videos (CVPR 2024)
- EventEgo3D: 3D Human Motion Capture from Egocentric Event Streams (CVPR 2024) [Project] [Paper]
- Egocentric Whole-Body Motion Capture with FisheyeViT and Diffusion-Based Motion Refinement (CVPR 2024) [Paper]
- Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting (CVPR 2024) [Paper]
- Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation (CVPR 2024) [Paper] [Code]
- Real-Time Simulated Avatar from Head-Mounted Sensors (CVPR 2024) [Project] [Paper]
- Instance Tracking in 3D Scenes from Egocentric Videos (CVPR 2024) [Paper] [Code]
- X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization (CVPR 2024) [Code]
- A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives (CVPR 2024) [Paper] [Project]
- Progress-Aware Online Action Segmentation for Egocentric Procedural Task Videos (CVPR 2024) [Paper] [Code]
- Action Scene Graphs for Long-Form Understanding of Egocentric Videos (CVPR 2024) [Paper] [Code]
- Retrieval-Augmented Egocentric Video Captioning (CVPR 2024) [Paper] [Citations] [Project] [Code]
- OAKINK2 : A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion (CVPR 2024) [Paper] [Project] [Citations]
- EgoGen: An Egocentric Synthetic Data Generator (CVPR 2024) [Paper] [Project] [Code]
- Fusing Personal and Environmental Cues for Identification and Segmentation of First-Person Camera Wearers in Third-Person Views (CVPR 2024) [Paper]
- Ego-Body Pose Estimation via Ego-Head Pose Estimation (CVPR 2023) [Project] [Paper] [Code] [Citations]
- IndustReal: A Dataset for Procedure Step Recognition Handling Execution Errors in Egocentric Videos in an Industrial-Like Setting (WACV 2023) [Paper] [Code]
- EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone (ICCV 2023) [Paper] [Project] [Code] [Citations]
- Weakly-Supervised Action Segmentation and Unseen Error Detection in Anomalous Instructional Videos (ICCV 2023) [Project] [Paper]
- HoloAssist: an Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World (ICCV 2023) [Project] [Paper] [Citations] [Citations]
- Ego-Humans: An Ego-Centric 3D Multi-Human Benchmark (ICCV 2023) [Paper] [Code] [Citations]
- CaptainCook4D: A dataset for understanding errors in procedural activities (ICMLW 2023) [Project] [Paper] [Code]
- Every Mistake Counts in Assembly (ArXiv 2023) [Paper] [Code]
- Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities (CVPR 2022) [Project] [Paper] [Code] [Citations]