LICENSE

ID R&D VoxTube Dataset 

The VoxTube contains ~4.5M 4-seconds segments from >300K unique 
utterances (CC BY videos) for 5.040 YouTube channels. The dataset size 
is approximately 5k hours of speech.

The speakers span a wide range of different ethnicities, accents, 
professions and ages.

We provide YouTube URLs and timestamps from the dataset.

The data is covered under a Creative Commons
Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) 
license (Please read the license terms here 
https://creativecommons.org/licenses/by-nc-sa/4.0/).

Downloading this dataset implies agreement to follow the same
conditions for any modification and/or
re-distribution of the dataset in any form.

Additionally any entity using this dataset agrees to the following conditions:

THIS DATASET IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


Please cite [1] below if you make use of the dataset.

[1] I. Yakovlev, A. Okhotnikov, N. Torgashov, R. Makarov, 
Y. Voevodin, K. Simonchik
VoxTube: a multilingual speaker recognition dataset  
INTERSPEECH, 2023.