Temporally Consistent Unbalanced Optimal Transport for Unsupervised Action Segmentation

Ming Xu, Stephen Gould

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Citations (Scopus)

Abstract

We propose a novel approach to the action segmentation task for long, untrimmed videos, based on solving an opti-mal transport problem. By encoding a temporal consistency prior into a Gromov- Wasserstein problem, we are able to decode a temporally consistent segmentation from a noisy affinity/matching cost matrix between video frames and action classes. Unlike previous approaches, our method does not require knowing the action order for a video to attain temporal consistency. Furthermore, our resulting (fused) Gromov- Wasserstein problem can be efficiently solved on GPUs using a few iterations of projected mirror descent. We demonstrate the effectiveness of our method in an unsu-pervised learning setting, where our method is used to gen-erate pseudo-labels for self-training. We evaluate our seg-mentation approach and unsupervised learning pipeline on the Breakfast, 50-Salads, YouTube Instructions and Desk-top Assembly datasets, yielding state-of-the-art results for the unsupervised video action segmentation task.

Original languageEnglish
Title of host publicationProceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
PublisherIEEE Computer Society
Pages14618-14627
Number of pages10
ISBN (Electronic)9798350353006
DOIs
Publication statusPublished - 2024
Event2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 - Seattle, United States
Duration: 16 Jun 202422 Jun 2024

Publication series

NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
ISSN (Print)1063-6919

Conference

Conference2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
Country/TerritoryUnited States
CitySeattle
Period16/06/2422/06/24

Fingerprint

Dive into the research topics of 'Temporally Consistent Unbalanced Optimal Transport for Unsupervised Action Segmentation'. Together they form a unique fingerprint.

Cite this