A paper list of automatic video editing and its related computer vision tasks.
The papers are put into categories, in which there is unavoidably some overlapping and imprecision. I use some icons to mark several frequent application scenarios: π¬(talk/meeting), π(dance/performance), β½πποΈπΎ(sports), π(ads/promotional videos), π¬(movie), etc.
Note: This paper list does not include the works on image/video manipulation (e.g. content editing, object removal, video stylization).
LLM-Powered Editing.
[IUI 2024]
LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video Editing [paper]
Using texts as input to automatically create video sequences from a collection of videos or images.
[ICMR 2023]
Shot Retrieval and Assembly with Text Script for Video Montage Generation [paper] [code][MM 2022]
Transcript to Video: Efficient Clip Sequencing from Texts [paper] [project page][CHI 2020]
Generating Audio-Visual Slideshows from Text Articles Using Word Concreteness [paper][TOG 2019]
Write-A-Video: Computational Video Montage from Themed Text [paper][TMM 2020]
Story-driven Video Editing [paper][IMX 2020]
Joint Attention for Automated Video Editing [paper] π¬[UIST 2016]
QuickCut: An Interactive Tool for Editing Narrated Video [paper]
Modifying the transcript of a speech to change the speech content or to remove filler words.
[TOG 2019]
Text-based Editing of Talking-head Video [paper] π¬[TOG 2012]
Tools for Placing Cuts and Transitions in Interview Video [paper] π¬
To cut unedited videos into shots and/or to put them in an appropriate order.
[ICASSP 2024]
Audio Match Cutting: Finding and Creating Matching Audio Transitions in Movies and Videos [paper] [project page][CVPR 2024]
Towards Automated Movie Trailer Generation [paper] π¬[MM 2023]
A Reinforcement Learning-Based Automatic Video Editing [paper] π¬[ICCV 2023 Workshop]
Representation Learning of Next Shot Selection for Vlog Editing [paper][WACV 2023]
Match Cutting: Finding Cuts with Smooth Visual Transitions [paper] [code] [project page] π¬[ACCV 2022]
Multi-modal Segment Assemblage Network for Ad Video Editing with Importance-Coherence Reward [paper] [code] π[SAC 2022]
Automated Video Editing Based on Learned Styles Using LSTM-GAN [paper] π[ICCV 2021 Workshop]
Learning Where To Cut From Edited Videos [paper][ICCV 2021]
Learning to Cut by Watching Movies [paper] [code & dataset] [project page][WACV 2018]
Learning Video-Story Composition via Recurrent Neural Network [paper][arxiv 2018]
From Trailers to Storylines: An Efficient Way to Learn from Movies [paper] π¬[CVPR 2016]
Video-Story Composition via Plot Analysis [paper]
To select video shots from multiple camera views or multiple takes of the same event.
[MIG 2023]
Real-time Computational Cinematographic Editing for Broadcasting of Volumetric-captured events: an Application to Ultimate Fighting [paper] π₯[TOG 2022]
PopStage: The Generation of Stage Cross-Editing Video Based on Spatio-Temporal Matching [paper] [project page] π[ECCV 2022 Workshop]
Temporal and Contextual Transformer for Multi-Camera Editing of TV Shows [paper][ICME 2021]
Reinforcement Learning Based Automatic Personal Mashup Generation [paper][TOMCCAP 2021]
Smart Director: An Event-Driven Directing System for Live Broadcasting [paper] β½[ICISP 2018]
Automatic Camera Selection in the Context of Basketball Game [paper] π[TOG 2017]
Computational Video Editing for Dialogue-Driven Scenes [paper] π¬[ACE 2017]
Automatic System for Editing Dance Videos Recorded Using Multiple Cameras [paper] π[TOG 2014]
Automatic Editing of Footage from Multiple Social Cameras [paper][CHI 2008]
Improving Meeting Capture by Applying Television Production Principles with Audio and Motion Detection [paper] π¬[ICME 2007]
Automatic Multi-Modal Meeting Camera Selection for Video-Conferences and Meeting Browsers [paper] π¬
[CVPR 2023]
Collaborative Noisy Label Cleaner: Learning Scene-aware Trailers for Multi-modal Highlight Detection in Movies [paper] [code] π¬[NeurIPS 2022 Workshop]
Videogenic: Video Highlights via Photogenic Moments [paper] [project page][AutoUI 2021]
Automatic Generation of Road Trip Summary Video for Reminiscence and Entertainment using Dashcam Video [paper][MM 2021]
Automated Multi-Modal Video Editing for Ads Video [paper] π[MM 2021]
VideoDiscovery: An Automatic Short-Video Generation System for E-commerce Live-streaming [paper] [project page] π[ECCV 2020]
Learning Trailer Moments in Full-Length Movies [paper] π¬[MMAsia 2019]
Domain Specific and Idiom Adaptive Video Summarization [paper] π[MM 2019]
Personalized Video Summarization with Idiom Adaptation [paper] π[MM 2019]
Generating 1 Minute Summaries of Day Long Egocentric Videos [paper] [code][TMM 2019]
Automatic Curation of Sports Highlights Using Multimodal Excitement Features [paper] ποΈπΎ[ICNC-FSKD 2019]
Towards Data-Driven Automatic Video Editing [paper] π¬[CVPR 2018 Workshop]
The Excitement of Sports: Automatic Highlights Using Audio/Visual Cues [paper] ποΈπΎ[CVPR 2013]
Story-Driven Summarization for Egocentric Video [paper][MM 2003]
AVE: automated home video editing [paper]
[arxiv 2024]
VCoME: Verbal Video Composition with Multimodal Editing Effects [paper] [code] π¬[CHI 2024]
ChunkyEdit: Text-first video interview editing via chunking [paper] π¬[IUI 2024]
ExpressEdit: Video Editing with Natural Language and Sketching [paper] [code] [project page][UIST 2023]
Automated Conversion of Music Videos into Lyric Videos [paper] [project page][NeurIPS 2022 Workshop]
VideoMap: Video Editing in Latent Space [paper] [project page][ECCV 2022]
AutoTransition: Learning to Recommend Video Transition Effects [paper] [code] [dataset][IJCAI 2020 Demonstrations Track]
An AI-Empowered Visual Storyline Generator [paper] π[AAAI 2020 Student Abstract]
Generating Engaging Promotional Videos for E-commerce Platforms [paper] π[UIST 2020]
Automatic Video Creation From a Web Page [paper][TOM 2020]
AutoFoley: Artificial Synthesis of Synchronized Sound Tracks for Silent Videos With Deep Learning [paper][CHI 2019]
B-Script: Transcript-based B-roll Video Editing with Recommendations [paper]
To change the video speed.
[PRL 2023]
A Multimodal Hyperlapse Method Based on Video and Songs' Emotion Alignment [paper][TPAMI 2023]
Text-Driven Video Acceleration: A Weakly-Supervised Reinforcement Learning Method [paper] [project page][CVPR 2022 Workshop]
Video-ReTime: Learning Temporally Varying Speediness for Time Remapping [paper][TPAMI 2020]
A Sparse Sampling-Based Framework for Semantic Fast-Forward of First-Person Videos [paper][MM 2020]
Automated Aesthetic Enhancement of Videos [paper] π[CVPR 2018]
A Weighted Sparse Sampling and Smoothing Frame Transition Approach for Semantic Fast-Forward First-Person Videos [paper] [project page][ECCV 2016 Workshop]
Towards Semantic Fast-Forward and Stabilized Egocentric Videos [paper][ICIP 2016]
Fast-Forward Video Based on Semantic Extraction [paper][TOG 2015]
Real-Time Hyperlapse Creation via Optimal Frame Selection [paper][TOG 2014]
First-Person Hyper-Lapse Videos [paper]
[arxiv 2023]
AutoMatch: A Large-scale Audio Beat Matching Benchmark for Boosting Deep Learning Assistant Video Editing [paper][SIBGRAPI 2021]
Musical Hyperlapse: A Multimodal Approach to Accelerate First-Person Videos [paper][CVPR 2018 Workshop]
Visual Rhythm and Beat [paper][TOG 2015]
audeosynth: Music-Driven Video Montage [paper]
To crop the video based on actionness, aesthetics, etc.
[arxiv 2024]
Reframe Anything: LLM Agent for Open World Video Reframing [paper][WACV 2024]
Real Time GAZED: Online Shot Selection and Editing of Virtual Cameras from Wide-Angle Monocular Video Recordings [paper][CVPR 2020 Workshop]
As Seen on TV: Automatic Basketball Video Production Using Gaussian-Based Actionness and Game States Recognition [paper] [project page] π[CHI 2020]
GAZEDβ Gaze-guided Cinematic Editing of Wide-Angle Monocular Video Recordings [paper] π[SA 2017 Poster]
Aesthetic Temporal and Spatial Editing of Casual Videos [paper]
To extract the editing styles in a source video and apply them to other video footages.
[CVPR 2023]
JAWS: Just A Wild Shot for Cinematic Transfer in Neural Radiance Fields [paper] [project page][CVPR 2021 Workshop]
Editing Like Humans: A Contextual, Multimodal Framework for Automated Video Editing [paper] [project page] π¬[CVPR 2021 Workshop]
Automatic Non-Linear Video Editing Transfer [paper]
[CVPR 2024]
Cinematic Behavior Transfer via NeRF-based Differentiable Filming [paper] [project page][SIGGRAPH 2023 Poster]
Dynamic Storyboard Generation in an Engine-based Virtual Environment for Video Production [paper] [project page][CHI 2021]
Virtual Camera Layout Generation using a Reference Video [paper][TOMCCAP 2018]
Thinking Like a Director: Film Editing Patterns for Virtual Cinematographic Storytelling [paper]
Datasets and papers related to video editing, camera movementπ₯, shot typeπΌοΈ, etc.
[arxiv 2024]
Edit3K: Universal Representation Learning for Video Editing Components [paper][CVPR 2024]
Neighbor Relations Matter in Video Scene Detection [paper] [code][WACV 2024]
Movie Genre Classification by Language Augmentation and Shot Sampling [paper] [code] π¬[IMXw 2023]
Recognition of Camera Angle and Camera Level in Movies from Single Frames [paper] [project page] π¬πΌοΈ[ICCV 2023 Workshop]
LEMMS: Label Estimation of Multi-feature Movie Segments [paper] π¬πΌοΈ[ICCV 2023]
Long-range Multimodal Pretraining for Movie Understanding [paper] π¬[ECCV 2022 Workshop]
Movie Lens: Discovering and Characterizing Editing Patterns in the Analysis of Short Movie Sequences [paper] π¬[ECCV 2022]
The Anatomy of Video Editing: A Dataset and Benchmark Suite for AI-Assisted Video Editing [paper] [code & dataset] π¬π₯πΌοΈ[ECCV 2022]
MovieCuts: A New Dataset and Benchmark for Cut Type Recognition [paper] [code & dataset] π¬[ICIP 2022]
HISTORIAN: A Large-Scale HISTORIcal Film Dataset with Cinematographic ANnotation [paper] [code & dataset] π¬π₯[ICIS Fall 2021]
RO-TextCNN Based MUL-MOVE-Net for Camera Motion Classification [paper] [code & dataset] π₯[ICCV 2021 Workshop]
High-Level Features for Movie Style Understanding [paper] π¬π₯[ECCV 2020]
MovieNet: A Holistic Dataset for Movie Understanding [paper] [code] [project page & dataset] π¬π₯πΌοΈ[ECCV 2020]
A Unified Framework for Shot Type Classification Based on Subject Centric Lens [paper] [project page & dataset] π¬π₯πΌοΈ[ICIP 2011]
Using Context Saliency For Movie Shot Classification [paper] π¬πΌοΈ