Skip to content

This repository contains video datasets that can be used for training coarse to fine-grained (phase, step and action) temporal classification tasks.

License

Notifications You must be signed in to change notification settings

maxboels/video-action-recognition-datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 

Repository files navigation

Description

This repository contains video datasets that can be used for training coarse to fine-grained (phase, step and action) temporal classification tasks.

Thank you to my colleague Luis C. Garcia-Peraza-Herrera for initiating the content and repo structure.

Surgical video datasets

Dataset Task Annotations Procedures Paper
CholecT50 Every frame is annotated with labels from the triplet: instrument, verb and target for the recognition of instrument-tissue interaction in laparoscopic cholecystectomies. This novel challenge investigates the state-of-the-art on surgical fine-grained activity recognition. action, tools, tissue 50 N/A
Hei-Chole A dataset with 33 laparoscopic cholecystectomy videos from three surgical centers with a total operation time of 22 hours was created. Labels included annotation of seven surgical phases with 250 phase transitions, 5514 occurences of four surgical actions, 6980 occurences of 21 surgical instruments from seven instrument categories and 495 skill classifications in five skill dimensions. The dataset was used in the 2019 Endoscopic Vision challenge, sub-challenge for surgical workflow and skill analysis. phase, action, tools, skills 24 (33 test not released) Lena Maier-Hein et al. 2021
DAISI DAISI leverages on images and instructions to provide step-by-step demonstrations of how to perform procedures from various medical disciplines. The dataset was acquired from real surgical procedures and data from academic textbooks. captions 13k images Edgar Rojas-Munoz et al. 2020
Cholec80 80 videos of cholecystectomy surgeries performed by 13 surgeons. The videos are captured at 25 fps. The dataset is labeled with the phase (at 25 fps) and tool presence annotations (at 1 fps). A tool is defind as present in an image if at least half of the tool tip is visible. phases, tools 80 Twinanda et al. 2016
CATARACTS This dataset consists of 50 cataract surgery. It was annotated for two main tasks: surgical tool presence detection and surgical activity recognition. It was divided into two sets (train, test) for the surgical tool presence detection task and 3 sets (train, dev, test) for the activity recognition task. phases, steps 101 N/A
PETRAW Recognize all levels of granularity of the surgical workflow (phases, steps, and action verb) with different modalities configurations. phases, steps, actions 100 N/A
MISAW The “MIcro-Surgical Anastomose Workflow recognition on training sessions” (MISAW) sub-challenge as a part of the MICCAI 2020. Multi-Granularity recognition: One model to recognize phases, steps and activities. Information: stereoscopic video, kinematic data, workflow annotation at 3 levels of granularity (phases, steps, and activities). phases, steps, activities, actions 27 Huaulmé et al. MICCAI 2021

Private datasets

  • ByPass40 - Strasbourg University
  • MitiSW - MITI group at the Klinikum rechts der Isar in Munich

Non-medical video datasets

Dataset Task Annotations Procedures Paper
Kinetics A collection of large-scale, high-quality datasets of URL links of up to 650,000 video clips that cover 400/600/700 human action classes, depending on the dataset version. Each clip is human annotated with a single action class and lasts around 10 seconds. action 700/400 Lucas Smaira (DeepMind) 2020
Breakfast The Breakfast Actions Dataset comprises of 10 actions related to breakfast preparation, performed by 52 different individuals in 18 different kitchens. action 77 hours H. Kuehne CVPR 2014
50 Salads Activity recognition research has shifted focus from distinguishing full-body motion patterns to recognizing complex interactions of multiple entities. action, step 50 N/A
Epic-Kitchens-100 Largest dataset in first-person (egocentric) vision; multi-faceted, audio-visual, non-scripted recordings in native environments - i.e. the wearers' homes, capturing all daily activities in the kitchen over multiple days. action, verb and noun 100
FineGym FineGym, a new dataset built on top of gymnasium videos. It provides temporal annotations at both action and sub-action levels with a three-level semantic hierarchy. temporal action, sub-action and semantic three 99 N/A

Action bounding box detection

Dataset Brief description Images Procedures Paper
SARAS-MESAD2021 Dataset contains monocular digital recordings from da Vinci Xi robotic system. Two sub-datasets: MESAD-Real and MESAD-Phantom. MESAD-Real represents the prostatectomy procedures recorded on human patients. It contains four sessions of complete prostatectomy procedure performed by expert surgeons on real patients. MESAD-Phantom is also designed for surgeon action detection during prostatectomy, but is composed of videos captured during procedures on phantoms used for the training of surgeons. MESAD-Real comprises 21 action classes and MESAD-Phantom contemplates a smaller list of 14 action classes. Both the datasets share 11 action classes. N/A 9 N/A

Skill assessment and workflow recognition

Dataset Brief description Images Procedures Paper
JIGSAWS The JIGSAWS dataset consists of three components: kinematic data (Cartesian positions, orientations, velocities, angular velocities and gripper angle describing the motion of the manipulators), video data (stereo video captured from the endoscopic camera), and manual annotations of gestures (atomic surgical activity segment labels) and skill (global rating score using modified objective structured assessments of technical skills). N/A N/A Gao et al. 2014
Cataract-101 This dataset contains 101 videos of cataract surgeries annotated with two kinds of information: Anonymous ID and experience level of operating surgeon, and starting points of quasi-standardized operation phases in videos. 1.3M 101 Schoeffmann et al. 2018
HeiCo The data set contains of data from the ROBUST-MIS 2019 challenge and the Surgical Workflow Challenges from EndoVis 2017 and 2018. 10K 30 Maier-Hein et al. 2020
PETRAW Dataset for online automatic recognition of surgical workflow by using both kinematic and stereoscopic video information on a micro-anastomosis training task. N/A 100 N/A

Repositories holding multiple datasets

About

This repository contains video datasets that can be used for training coarse to fine-grained (phase, step and action) temporal classification tasks.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published