This repository contains video datasets that can be used for training coarse to fine-grained (phase, step and action) temporal classification tasks.
Thank you to my colleague Luis C. Garcia-Peraza-Herrera for initiating the content and repo structure.
Dataset | Task | Annotations | Procedures | Paper |
CholecT50 | Every frame is annotated with labels from the triplet: instrument, verb and target for the recognition of instrument-tissue interaction in laparoscopic cholecystectomies. This novel challenge investigates the state-of-the-art on surgical fine-grained activity recognition. | action, tools, tissue | 50 | N/A |
Hei-Chole | A dataset with 33 laparoscopic cholecystectomy videos from three surgical centers with a total operation time of 22 hours was created. Labels included annotation of seven surgical phases with 250 phase transitions, 5514 occurences of four surgical actions, 6980 occurences of 21 surgical instruments from seven instrument categories and 495 skill classifications in five skill dimensions. The dataset was used in the 2019 Endoscopic Vision challenge, sub-challenge for surgical workflow and skill analysis. | phase, action, tools, skills | 24 (33 test not released) | Lena Maier-Hein et al. 2021 |
DAISI | DAISI leverages on images and instructions to provide step-by-step demonstrations of how to perform procedures from various medical disciplines. The dataset was acquired from real surgical procedures and data from academic textbooks. | captions | 13k images | Edgar Rojas-Munoz et al. 2020 |
Cholec80 | 80 videos of cholecystectomy surgeries performed by 13 surgeons. The videos are captured at 25 fps. The dataset is labeled with the phase (at 25 fps) and tool presence annotations (at 1 fps). A tool is defind as present in an image if at least half of the tool tip is visible. | phases, tools | 80 | Twinanda et al. 2016 |
CATARACTS | This dataset consists of 50 cataract surgery. It was annotated for two main tasks: surgical tool presence detection and surgical activity recognition. It was divided into two sets (train, test) for the surgical tool presence detection task and 3 sets (train, dev, test) for the activity recognition task. | phases, steps | 101 | N/A |
PETRAW | Recognize all levels of granularity of the surgical workflow (phases, steps, and action verb) with different modalities configurations. | phases, steps, actions | 100 | N/A |
MISAW | The “MIcro-Surgical Anastomose Workflow recognition on training sessions” (MISAW) sub-challenge as a part of the MICCAI 2020. Multi-Granularity recognition: One model to recognize phases, steps and activities. Information: stereoscopic video, kinematic data, workflow annotation at 3 levels of granularity (phases, steps, and activities). | phases, steps, activities, actions | 27 | Huaulmé et al. MICCAI 2021 |
- ByPass40 - Strasbourg University
- MitiSW - MITI group at the Klinikum rechts der Isar in Munich
Dataset | Task | Annotations | Procedures | Paper |
Kinetics | A collection of large-scale, high-quality datasets of URL links of up to 650,000 video clips that cover 400/600/700 human action classes, depending on the dataset version. Each clip is human annotated with a single action class and lasts around 10 seconds. | action | 700/400 | Lucas Smaira (DeepMind) 2020 |
Breakfast | The Breakfast Actions Dataset comprises of 10 actions related to breakfast preparation, performed by 52 different individuals in 18 different kitchens. | action | 77 hours | H. Kuehne CVPR 2014 |
50 Salads | Activity recognition research has shifted focus from distinguishing full-body motion patterns to recognizing complex interactions of multiple entities. | action, step | 50 | N/A |
Epic-Kitchens-100 | Largest dataset in first-person (egocentric) vision; multi-faceted, audio-visual, non-scripted recordings in native environments - i.e. the wearers' homes, capturing all daily activities in the kitchen over multiple days. | action, verb and noun | 100 | |
FineGym | FineGym, a new dataset built on top of gymnasium videos. It provides temporal annotations at both action and sub-action levels with a three-level semantic hierarchy. | temporal action, sub-action and semantic three | 99 | N/A |
Dataset | Brief description | Images | Procedures | Paper |
SARAS-MESAD2021 | Dataset contains monocular digital recordings from da Vinci Xi robotic system. Two sub-datasets: MESAD-Real and MESAD-Phantom. MESAD-Real represents the prostatectomy procedures recorded on human patients. It contains four sessions of complete prostatectomy procedure performed by expert surgeons on real patients. MESAD-Phantom is also designed for surgeon action detection during prostatectomy, but is composed of videos captured during procedures on phantoms used for the training of surgeons. MESAD-Real comprises 21 action classes and MESAD-Phantom contemplates a smaller list of 14 action classes. Both the datasets share 11 action classes. | N/A | 9 | N/A |
Dataset | Brief description | Images | Procedures | Paper |
JIGSAWS | The JIGSAWS dataset consists of three components: kinematic data (Cartesian positions, orientations, velocities, angular velocities and gripper angle describing the motion of the manipulators), video data (stereo video captured from the endoscopic camera), and manual annotations of gestures (atomic surgical activity segment labels) and skill (global rating score using modified objective structured assessments of technical skills). | N/A | N/A | Gao et al. 2014 |
Cataract-101 | This dataset contains 101 videos of cataract surgeries annotated with two kinds of information: Anonymous ID and experience level of operating surgeon, and starting points of quasi-standardized operation phases in videos. | 1.3M | 101 | Schoeffmann et al. 2018 |
HeiCo | The data set contains of data from the ROBUST-MIS 2019 challenge and the Surgical Workflow Challenges from EndoVis 2017 and 2018. | 10K | 30 | Maier-Hein et al. 2020 |
PETRAW | Dataset for online automatic recognition of surgical workflow by using both kinematic and stereoscopic video information on a micro-anastomosis training task. | N/A | 100 | N/A |