Skip to content

Commit

Permalink
Improvements (#21)
Browse files Browse the repository at this point in the history
- Removed CNN vectors option
- Code will not error on non-video files in the season directory anymore (.DS_Store for example) (fixes Increase tolerance of non-video files #19)
- Added full-run test with some actual data
  • Loading branch information
nielstenboom authored Sep 5, 2021
1 parent 1ed2d0f commit 31e502e
Show file tree
Hide file tree
Showing 18 changed files with 42 additions and 421 deletions.
3 changes: 1 addition & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@

tests/data/resized320
.vscode/
test/
annotations.csv
*.h5
*.mp4
*.p
videos

Expand Down
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ WORKDIR /opt/recurring-content-detector

RUN conda install python=3.6 -y && \
pip install . && \
apt-get update && \
apt-get update --allow-releaseinfo-change && \
apt-get install libglib2.0-0 -y && \
apt-get install -y libsm6 libxext6 libxrender-dev -y && \
apt-get install ffmpeg -y && \
Expand Down
17 changes: 7 additions & 10 deletions README.MD
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[![Codacy Badge](https://app.codacy.com/project/badge/Grade/e263d84692974d38a0678f3090a09187)](https://www.codacy.com/manual/nielstenboom/recurring-content-detector?utm_source=github.com&utm_medium=referral&utm_content=nielstenboom/recurring-content-detector&utm_campaign=Badge_Grade)

# Recurring content detector (credits, recaps and previews detection)

**Update 05-09-2021: The CNN vectors were removed as they do not work on the Apple M1.**

This repository contains the code that was used to conduct experiments for a [master's thesis](https://github.com/nielstenboom/masterthesis/raw/master/main.pdf). The goal was to detect recaps, opening credits, closing credits and previews from video files in an unsupervised manner. This can be used to automate the labeling for the skip functionality of a VOD streaming service.

The experiments done in the master's thesis were done in jupyter notebooks, but as the code in these got quite messy. I packed the used code in a python package so that it can be re-used more easily.
Expand Down Expand Up @@ -30,9 +30,8 @@ To install the package, do the following steps (assuming you have an anaconda se
```bash
git clone https://github.com/nielstenboom/recurring-content-detector.git
cd recurring-content-detector
conda install faiss-cpu -c pytorch
conda install faiss-cpu=1.6.3 -c pytorch
pip install mkl
# optional step: change parameters in recurring_content_detector/config.py
pip install .
```

Expand Down Expand Up @@ -61,10 +60,9 @@ This will run the detection by building the color histogram feature vectors. Mak

The feature vector function can also be changed:
```python
# options for the function are ["CNN", "CH", "CTM"]
rcd.detect("/directory/with/season/videofiles", feature_vector_function="CNN")
# options for the function are ["CH", "CTM"]
rcd.detect("/directory/with/season/videofiles", feature_vector_function="CTM")
```
This will CNN vectors, which are a bit more accurate but take much longer to build.

The `detect` function has many more parameters that can be tweaked, the defaults it has, are the parameters I got the best results with on my experiments.

Expand Down Expand Up @@ -166,13 +164,12 @@ Total recall = 0.853

## Tests

There's a few tests in the test directory. They can also be run in the docker container, make sure you created a `videos` directory with some episodes in it:
There's a few tests in the test directory. They can also be run in the docker container:
```
docker run -it -v $(pwd):/opt/recurring-content-detector nielstenboom/recurring-content-detector:latest python -m pytest -s
docker run -it -v $(pwd):/opt/recurring-content-detector nielstenboom/recurring-content-detector:latest python -m pytest
```

## Credits
- https://github.com/noagarcia/keras_rmac for the CNN vectors
- https://github.com/facebookresearch/faiss for the efficient matching of the feature vectors

## Final words
Expand Down
2 changes: 2 additions & 0 deletions recurring_content_detector/detector.py
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,8 @@ def detect(video_dir, feature_vector_function="CH", annotations=None, artifacts_

# the video files used for the detection
videos = [f for f in os.listdir(video_dir) if os.path.isfile(os.path.join(video_dir, f))]
videos = [f for f in videos if video_functions.file_is_video(f)]

# make sure videos are sorted, use natural sort to correctly handle case of ep1 and ep10 in file names
videos = natsorted(videos, alg=ns.IGNORECASE)

Expand Down
11 changes: 0 additions & 11 deletions recurring_content_detector/featurevectors.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,6 @@
import numpy as np
from math import sqrt

from . import keras_rmac


def get_frame(frame_index, video):
"""
Given a frame position number and the videocapture variable, returns the frame as an image object (numpy array)
Expand Down Expand Up @@ -52,12 +49,6 @@ def color_texture_moments(img):

return result


def cnn_feature_vectors(img):
feature_vector = keras_rmac.rmac.to_feature_vector(img)
return feature_vector


def get_img_color_hist(image, binsize):
"""
Given an image as input, output its color histogram as a numpy array.
Expand Down Expand Up @@ -100,8 +91,6 @@ def construct_feature_vectors(video_fn, result_dir_name, vector_function, framej
vector_function = color_hist
elif vector_function == "CTM":
vector_function = color_texture_moments
elif vector_function == "CNN":
vector_function = cnn_feature_vectors

# make sure folder of experimentname exists or create otherwise
os.makedirs(os.path.dirname(vectors_fn), exist_ok=True)
Expand Down
120 changes: 0 additions & 120 deletions recurring_content_detector/keras_rmac/RoiPooling.py

This file was deleted.

1 change: 0 additions & 1 deletion recurring_content_detector/keras_rmac/__init__.py

This file was deleted.

Binary file not shown.
Empty file.
60 changes: 0 additions & 60 deletions recurring_content_detector/keras_rmac/get_regions.py

This file was deleted.

Loading

0 comments on commit 31e502e

Please sign in to comment.