Project Unsupervised Learning 2023 concentrates on solving a translation of manuscripts between image alphabets (EMNIST and KMNIST) utilizing unsupervised learning methods such as autoencoders (CAE and VAE), dimensionality reduction (PCA, UMAP) and clustering (K-Means, GMM). To simulate a real world problem, noise was added on generated manuscripts (salt and pepper, lines, rotation and scaling of letters).
- Download datasets and text
download.py
- (optional) Review spreadsheets of random samples to choose seeds
generate_spreadsheet.py
- (even more optional) Preview random (seeded) samples
preview.py
- Construct mapping from random (seeded) samples
generate_mapping.py
- Construct pages in both alphabets from mapping
generate_dataset.py