The 1st Workshop on User-Centric Narrative Summarization of Long Videos
In conjunction with ACM MM 2022
Morning of 10 October 2022, Lisbon, Portugal

On-site Venue: Pav4 R1.07, Lisbon Congress Center (CCL), Lisbon, Portugal
Time: 9:00 - 13:00 (UTC+1)

Thanks for Joining!
  • 27 September 2022 : Detailed workshop program added.
  • 26 August 2022 : The workshop will take place on Oct 10, 2022 (Morning)
  • 19 July 2022 : Paper submission deadline has been extended to 23 July 2022.
  • 3 July 2022 : Paper submission deadline has been extended.
  • 31 May 2022 : Submission info become available.
  • 27 May 2022 : Important Dates updated.
  • 5 April 2022 : Website opened.
NarSUM workshop

With video capture devices becoming widely popular, the amount of video data generated per day has seen a rapid increase over the past few years. Browsing through hours of video data to retrieve useful information is a tedious and boring task. To address this issue, video summarization has played a crucial role towards making this possible.

Video summarization is a well-researched topic in the multimedia community. However, the focus so far had been limited to creating summary to videos which are short (only a few minutes). This workshop aims to call for researcher on relevant background to focus on novel solution for user-centric narrative summarization of long videos. Specifically, the goal is to provide the users with meaningful information and insights from long input videos, potentially captured from multiple cameras. Since the generated output of any video summarization task will be finally consumed by humans, it is also important to have an element of storytelling, where the resulting summary is presented in the form of a narrative for humans to understand easily. These aspects have not been adequately addressed in the existing literature.

This workshop will also discuss other important aspects of the current video summarization research. For example, what is ‘important’ in a video and how to evaluate the goodness of a created summary is still subjective. Many works are based on human annotated training data where the relevance of each frame of the video is annotated (supervised learning methods), or some works consider the summary to be good if the original video can be well reconstructed (unsupervised methods). However, most of the current works do not explicitly take into account the scene semantics (e.g., scenes, objects, people, actions and relations) happening in the video, which are significant indicators in deciding what is ‘important’ in a video.

Call for Papers

This workshop aims to bring together researchers in academia and industry to discuss about topics related to video summarization of long videos, its applications, and other open problems.

Topics of interest, but not limited to
  • Summarization of single/multiple long videos
  • Action and scene graph generation from videos
  • Narrative summary and video storytelling
  • Datasets for long video summarization
  • Novel evaluation criteria/metrics
  • Novel application of long video summarization
  • Open challenges in video summarization
  • Privacy preserved video summarization
Important Dates
  • Call for submission:
    1st April 2022
  • Submission portal opens:
    1 June 2022
  • Paper submission deadline:
    3 July 2022
    23 July 2022
  • Notification to authors:
    31 July 2022
  • Camera-ready submission:
    7 August 2022
  • Workshop date:
    10 October 2022
  • The submission deadline is at 11:59 p.m. of the stated deadline date Anywhere on Earth.
A cash prize of $500 USD will be awarded as the best paper award.

Click the image to download the poster PDF.

  • Submission link:
    All submissions to this workshop must be original works not under review at any other workshop, conference, or journal. Our workshop will follow the same paper submission guidelines as ACM MM 2022 main conference.
  • Paper format: Submitted papers (.pdf format) must use the ACM Article Template
  • Paper length: Submitted papers may consist of 6 to 8 pages. Up to two additional pages may be added for references. The reference pages must only contain references. Over length papers will be rejected without review. Optionally, you may upload supplementary material that complements your submission (50Mb limit). There is no distinction between long and short papers, but the authors may themselves decide on the appropriate length of their paper.
  • Blinding & review: All papers will undergo the same review process and review period. Submitted papers must conform to the “double-blind” review policy. Accepted papers will be published in the ACM Digital Library.
  • For detailed instructions, please refer to the paper submission instructions at ACM MM 2022 main conference
Invited Speakers

Ioannis (Yiannis) Patras

Queen Mary University of London
Ioannis Patras is a Professor in Computer Vision and Human Sensing in the School of Electronic Engineering and Computer Science in the Queen Mary, University of London. He is/has been in the organizing committee of ACM Multimedia 2022, Multimedia Modelling 2021, ICMR 2019, IEEE Image, Video and Multidimensional Signal Processing Symposium, 2018, ACM Multimedia 2013, ICMR2011, Face and Gesture Recognition 2008, BMVC 2009 and was the general chair of WIAMIS 2009. He is associate editor in the journal of Pattern Recognition, and Area chair in all major Computer Vision conferences including, ICCV, ECCV, FG and BMVC. He is a senior member of IEEE and a member of the Visual Signal Processing and Communications Technical Committee (VSPC) of CAS society. His research interests lie in the areas of Computer Vision and Human Sensing using Machine Learning methodologies. This includes video analysis and summarization, multimodal affect analysis and non-verbal analysis of human behaviour.
Title: Video Summarization in the Deep Learning Era: Current Landscape and Future Directions
In this talk we will provide an overview of the field of video summarization with a focus on the developments, the trends and the open challenges in the era of Deep Learning and Big Data. After a brief introduction to the problem, we will provide a broad taxonomy of the works in the area and the recent trends from multiple perspectives, including types of methodologies/architectures; supervision signals; and modalities. We will then present current datasets and evaluation protocols highlighting their limitations and challenges that are faced with respect to it. Finally, we will close by giving our perspective for the challenges in the field and for interesting future directions.

Manmohan Chandraker

University of California San Diego
Manmohan Chandraker is an associate professor at the University of California, San Diego and leads computer vision research at NEC Labs America. His research has been recognized with Google Research Awards in 2021, 2019 and 2018, the NSF CAREER Award in 2018, the Best Paper Award at CVPR 2014, an IEEE PAMI special issue on Best Papers from CVPR 2011, the 2009 CSE Dissertation Award for Best Thesis from UCSD and the Marr Prize honorable mention at ICCV 2007.
Title: Learning, Understanding and Interaction in Videos
Advances in mobile phone camera technologies and internet connectivity have made videos one of the most intuitive ways to communicate and share experiences. Millions of cameras deployed in our homes, offices and public spaces record videos for purposes ranging across safety, assistance, entertainment and many others. This talk describes some of our recent progress in learning, understanding and interaction with such digital media. It will introduce methods in unsupervised and self-supervised representation learning that allow video solutions to be efficiently deployed with minimal data curation. It will discuss how physical priors or human knowledge are leveraged to understand insights in videos ranging from three-dimensional scene properties to language-based descriptions. It will also illustrate how these insights allow us to augment or interact with digital media with unprecedented photorealism and ease.
This workshop will be a half day event with invited talks, paper presentation, and a panel discussion. The panelist is comprised of the invited speakers, and one of the organizers will play the role of moderator.

On-site venue: Pav4 R1.07, Lisbon Congress Center (CCL), Lisbon, Portugal
Date & time: October 10th, 2022, 9:00 to 13:00
All times given are according to Lisbon time (UTC+1)
Time Type Title Session Chair
09:00 - 09:05 Opening remarks
9:05 -10:00 Invited talk and Invited presentation Jianquan Liu (NEC Corporation)
09:05 - 09:45 Invited talk Learning, Understanding and Interaction in Videos
Manmohan Chandraker (University of California San Diego)
09:45 - 10:00 Invited presentation Compute to Tell the Tale: Goal-Driven Narrative Generation
(Brave New Idea paper, ACM MM 2022)
Yongkang Wong (National University of Singapore)
10:00 - 10:45 Paper presentations Yongkang Wong (National University of Singapore)
10:00 - 10:15 Paper Narrative Dataset: Towards Goal-Driven Narrative Generation
Karen Stephen (NEC Corporation), Rishabh Sheoran (National University of Singapore), Satoshi Yamazaki (NEC Corporation)
10:15 - 10:30 Paper Soccer Game Summarization using Audio Commentary, Metadata, and Captions
Sushant Gautam (Tribhuvan University), Cise Midoglu (Simula Metropolitan Center for Digital Engineering), Saeed Shafiee Sabet ((Simula Metropolitan Center for Digital Engineering), Dinesh Baniya Kshatri (Tribhuvan University), Pål Halvorsen (Simula Metropolitan Center for Digital Engineering)
10:30 - 10:45 Paper Contrastive Representation Learning for Expression Recognition from Masked Face Images
Fanxing Luo (Ritsumeikan University), Longjiao Zhao (Nagoya University), Yu Wang (Hitotsubashi University), Jien Kato (Ritsumeikan University)
10:45 - 11:00 Break
11:00 - 12:30 Invited talk and Panel discussion Mohan Kankanhalli (National University of Singapore)
11:00 - 11:40 Invited talk Video Summarization in the Deep Learning Era: Current Landscape and Future Directions
Ioannis Patras (Queen Mary University of London)
11:40 - 12:30 Panel discussion Emerging Topics on Video Summarization
12:30 - 12:35 Closing
Organizing Committee
Organizing Co-chairs
Karen Stephen
Visual Intelligence Research Laboratories, NEC Corporation, Japan
Rishabh Sheoran
National University of Singapore, Singapore
Anusha Bhamidipati
Biometrics Research Laboratories, NEC Corporation, Japan
Program Committee