Awesome Egocentric Action Understanding

Egocentric Action Understanding (EAU) aims at understanding human actions based on videos shot by first-person cameras.

In this reprository, interetsting papers in EAU are collected to show the development of the EAU community.

💥 NEWS: ICLR-2025 papers are added to the list.

Survey

Exocentric To Egocentric Transfer For Action Recognition: A Short Survey (ArXiv 2024) [Paper]
A Survey on 3D Egocentric Human Pose Estimation (CVPRW 2024) [Paper]
An Outlook into the Future of Egocentric Vision (IJCV 2024) [Paper] [Citations]

2025

Acquisition through My Eyes and Steps: A Joint Predictive Agent Model in Egocentric Worlds (ArXiv 2025) [Paper]
PRVQL: Progressive Knowledge-guided Refinement for Robust Egocentric Visual Query Localization (ArXiv 2025) [Paper]
EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering (ArXiv 2025) [Paper] [Code]
Acquisition through My Eyes and Steps: A Joint Predictive Agent Model in Egocentric Worlds (ArXiv 2025) [Paper]
HD-EPIC: A Highly-Detailed Egocentric Video Dataset (ArXiv 2025) [Paper] [Project]
Hier-EgoPack: Hierarchical Egocentric Video Understanding with Diverse Task Perspectives (ArXiv 2025) [Paper] [Project] [Code]
EgoMe: Follow Me via Egocentric View in Real World (ArXiv 2025) [Paper]
X-LeBench: A Benchmark for Extremely Long Egocentric Video Understanding (ArXiv 2025) [Paper]
From My View to Yours: Ego-Augmented Learning in Large Vision Language Models for Understanding Exocentric Daily Living Activities (ArXiv 2025) [Paper]
HaWoR: World-Space Hand Motion Reconstruction from Egocentric Videos (ArXiv 2025) [Paper] [Project]

MM-Ego: Towards Building Egocentric Multimodal LLMs (ICLR 2025) [Paper]
X-GEN: EGO-CENTRIC VIDEO PREDICTION BY WATCHING EXO-CENTRIC VIDEOS (ICLR 2025) [Paper]
MODELING FINE-GRAINED HAND-OBJECT DYNAMICS FOR EGOCENTRIC VIDEO REPRESENTATION LEARNING (ICLR 2025) [Paper]
EGOSIM: EGOCENTRIC EXPLORATION IN VIRTUAL WORLDS WITH MULTI-MODAL CONDITIONING (ICLR 2025) [Paper] [Project]
Do Egocentric Video-Language Models Really Understand Hand-Object Interactions? (ICLR 2025) [Paper] [Code]
EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting (3DV 2025) [Paper]

2024

Yearly Key words: Ego-LLM🔥, Ego-Motion🔥, New Dataset, 3D, Ego-Exo, Multi-Modality, HOI, Mistake Detection, Video Generation

Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model (ArXiv 2024) [Paper] [Project]
EgoCast: Forecasting Egocentric Human Pose in the Wild (ArXiv 2024) [Paper]
Which Viewpoint Shows it Best? Language for Weakly Supervising View Selection in Multi-view Videos (ArXiv 2024) [Paper]
EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation (ArXiv 2024) [Paper] [Project] [Code]
VidEgoThink: ASSESSING EGOCENTRIC VIDEO UNDERSTANDING CAPABILITIES FOR EMBODIED AI (ArXiv 2024) [Paper]
EgoOops: A Dataset for Mistake Action Detection from Egocentric Videos with Procedural Texts (ArXiv 2024) [Paper] [Project]
EGOLM: Multi-modal Language Model of Egocentric Motions (ArXiv 2024) [Paper] [Project]
EgoAvatar: Egocentric View-Driven and Photorealistic Full-body Avatars (ArXiv) [Paper]
Estimating Body and Hand Motion in an Ego-sensed World (ArXiv 2024) [Paper] [Project] [Code]
HMD2: Environment-aware Motion Generation from Single Egocentric Head-Mounted Device (ArXiv 2024)
EMHI: A Multimodal Egocentric Human Motion Dataset with HMD and Body-Worn IMUs (ArXiv 2024) [Paper]
Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos (ArXiv 2024) [Paper] [Project] [Code]
Unlocking Exocentric Video-Language Data for Egocentric Video Representation Learning (ArXiv 2024) [Paper]
HUP-3D: A 3D multi-view synthetic dataset for assisted-egocentric hand-ultrasound pose estimation (ArXiv 2024) [Paper]
PARSE-Ego4D: Personal Action Recommendation Suggestions for Egocentric Videos (ArXiv 2024) [Paper] [Project]
Egocentric Vision Language Planning (ArXiv) [Paper]
Diff-IP2D: Diffusion-Based Hand-Object Interaction Prediction on Egocentric Videos (ArXiv 2024) [Paper]
Spatial Cognition from Egocentric Video: Out of Sight, Not Out of Mind (ArXiv 2024) [Paper] [Project]
EFM3D: A Benchmark for Measuring Progress Towards 3D Egocentric Foundation Models (ArXiv) [Paper]
HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision (ArXiv 2024) [Paper] [Code]
Intention-driven Ego-to-Exo Video Generation (ArXiv 2024) [Paper]

Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos (NeurIPS 2024) [Paper] [Code]
Exocentric-to-Egocentric Video Generation (NeurIPS 2024) [Paper] [Code]
HENASY: Learning to Assemble Scene-Entities for Interpretable Egocentric Video-Language Model (NeurIPS 2024) [Paper]
EgoSim: An Egocentric Multi-view Simulator for Body-worn Cameras during Human Motion (NeurIPS 2024)
Estimating Ego-Body Pose from Doubly Sparse Egocentric Video Data (NeurIPS 2024) [Paper]
EgoChoir: Capturing 3D Human-Object Interaction Regions from Egocentric Views (NeurIPS 2024) [Paper] [Code] [Project]
Ego3DT: Tracking All 3D Objects in Ego-Centric Video of Daily Activities (ACMMM 2024)
4Diff: 3D-Aware Diffusion Model for Third-to-First Viewpoint Translation (ECCV 2024) [Paper] [Project]
Multimodal Cross-Domain Few-Shot Learning for Egocentric Action Recognition (ECCV 2024) [Paper] [Project] [Code]
Masked Video and Body-worn IMU Autoencoder for Egocentric Action Recognition (ECCV 2024) [Paper]
AMEGO: Active Memory from long EGOcentric videos (ECCV 2024) [Paper] [Project] [Code]
Are Synthetic Data Useful for Egocentric Hand-Object Interaction Detection? (ECCV 2024) [Paper] [Project] [Code]
ActionVOS: Actions as Prompts for Video Object Segmentation (ECCV 2024) [Paper] [Code]
Discovering Novel Actions from Open World Egocentric Videos with Object-Grounded Visual Commonsense Reasoning (ECCV 2024) [Paper]
Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos (ECCV 2024) [Paper] [Project]
Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation (ECCV 2024) [Paper]
Spherical World-Locking for Audio-Visual Localization in Egocentric Videos (ECCV 2024) [Paper] [Project]
LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning (ECCV 2024) [Paper] [Project] [Code]
Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects (ECCV 2024) [Paper]
EgoPoser: Robust Real-Time Ego-Body Pose Estimation in Large Scenes (ECCV 2024) [Paper]
EgoPoseFormer: A Simple Baseline for Egocentric 3D Human Pose Estimation (ECCV 2024) [Paper]
EgoBodu3M: Egocentric Body Tracking on a VR Headset using a Diverse Dataset (ECCV 2024) [Paper]
3D Hand Pose Estimation in Everyday Egocentric Images (ECCV 2024) [Paper] [Project]
Put Myself in Your Shoes: Lifting the Egocentric Perspective from Exocentric Videos (ECCV 2024) [Paper] [Citations]
EgoLifter: Open-world 3D Segmentation for Egocentric Perception (ECCV 2024) [Project] [Paper]
Nymeria: A Massive Collection of Multimodal Egocentric Daily Motion in the Wild (ECCV 2024) [Paper] [Project]
EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action Understanding (ECCV 2024) [Paper] [Code]
Ex2Eg-MAE: A Framework for Adaptation of Exocentric Video Masked Autoencoders for Egocentric Social Role Understanding (ECCV 2024) [Paper]
Synchronization is All You Need: Exocentric-to-Egocentric Transfer for Temporal Action Segmentation with Unlabeled Synchronized Video Pairs (ECCV 2024) [Paper] [Code]
EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval (ECCV 2024) [Paper] [Code]
SimpleEgo: Predicting Probabilistic Body Pose from Egocentric Cameras (3DV 2024) [Paper]
Multi-Factor Adaptive Vision Selection for Egocentric Video Question Answering (ICML 2024) [Paper] [Code]
Grounded Question-Answering in Long Egocentric Videos (CVPR 2024) [Paper]
Learning to Segment Referred Objects from Narrated Egocentric Videos (CVPR 2024) [Paper]
Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives (CVPR 2024) [Project] [Paper] [Citations]
EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World (CVPR 2024) [Paper] [Code]
EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models (CVPR 2024) [Paper]
The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective (CVPR 2024) [Paper]
Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos (CVPR 2024) [Paper]
SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos (CVPR 2024)
PREGO: online mistake detection in PRocedural EGOcentric videos (CVPR 2024) [Paper] [Code]
Error Detection in Egocentric Procedural Task Videos (CVPR 2024) [Paper] [Code]
3D Human Pose Perception from Egocentric Stereo Videos (CVPR 2024)
EventEgo3D: 3D Human Motion Capture from Egocentric Event Streams (CVPR 2024) [Project] [Paper]
Egocentric Whole-Body Motion Capture with FisheyeViT and Diffusion-Based Motion Refinement (CVPR 2024) [Paper]
Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting (CVPR 2024) [Paper]
Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation (CVPR 2024) [Paper] [Code]
Real-Time Simulated Avatar from Head-Mounted Sensors (CVPR 2024) [Project] [Paper]
Instance Tracking in 3D Scenes from Egocentric Videos (CVPR 2024) [Paper] [Code]
X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization (CVPR 2024) [Code]
A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives (CVPR 2024) [Paper] [Project]
Progress-Aware Online Action Segmentation for Egocentric Procedural Task Videos (CVPR 2024) [Paper] [Code]
Action Scene Graphs for Long-Form Understanding of Egocentric Videos (CVPR 2024) [Paper] [Code]
Retrieval-Augmented Egocentric Video Captioning (CVPR 2024) [Paper] [Citations] [Project] [Code]
OAKINK2 : A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion (CVPR 2024) [Paper] [Project] [Citations]
EgoGen: An Egocentric Synthetic Data Generator (CVPR 2024) [Paper] [Project] [Code]
Fusing Personal and Environmental Cues for Identification and Segmentation of First-Person Camera Wearers in Third-Person Views (CVPR 2024) [Paper]

2023

Ego-Body Pose Estimation via Ego-Head Pose Estimation (CVPR 2023) [Project] [Paper] [Code] [Citations]
IndustReal: A Dataset for Procedure Step Recognition Handling Execution Errors in Egocentric Videos in an Industrial-Like Setting (WACV 2023) [Paper] [Code]
EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone (ICCV 2023) [Paper] [Project] [Code] [Citations]
Weakly-Supervised Action Segmentation and Unseen Error Detection in Anomalous Instructional Videos (ICCV 2023) [Project] [Paper]
HoloAssist: an Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World (ICCV 2023) [Project] [Paper] [Citations] [Citations]
Ego-Humans: An Ego-Centric 3D Multi-Human Benchmark (ICCV 2023) [Paper] [Code] [Citations]
CaptainCook4D: A dataset for understanding errors in procedural activities (ICMLW 2023) [Project] [Paper] [Code]
Every Mistake Counts in Assembly (ArXiv 2023) [Paper] [Code]

2022

Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities (CVPR 2022) [Project] [Paper] [Code] [Citations]

Name		Name	Last commit message	Last commit date
Latest commit History 114 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Egocentric Action Understanding

Survey

2025

2024

2023

2022

About

Releases

Packages

Lyman-Smoker/Awesome_Ego_Action

Folders and files

Latest commit

History

Repository files navigation

Awesome Egocentric Action Understanding

Survey

2025

2024

2023

2022

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages