This project was completed by three contributors (@RonaldErnst, @kolusask, @Icheler) as part of our studies @TUM. It was part of the course Machine Learning for 3D Geometry.
We worked on an addition to MVCNN by modifying the CNN-layers to more up to date pre-trained networks. In addition we changed the base dataset by merging 'ShapeNet' and 'ModelNet' into a new dataset which we called 'Unified'. Additionally we modified the underlying architecture by modifying the pooling operation and changing its location to see changes in performance.
The base model uses a VGG-16 like feature extractor for CNN1. We added multiple different more state of the art pre-trained CNNs to see how much impact the changing of the extraction model has on modelperformance. In addition we also modified the pooling operation from Max-Pooling in all baselines to for example mean-pooling.
Model training performance for stage 1 on ModelNet with shaded images:
Model training performance for stage 2 on ModelNet with shaded images:
Architecture | ModelNet | Unified |
---|---|---|
VGG-16 | 95.03 | 85.37 |
ConvNext | 95.64 | 85.92 |
ResNet-18 | 94.95 | 85.86 |
ResNet-18 with Mean-Pool | 94.75 | 87.30 |
The environment.yml file has all the necessary dependencies to train the model yourself if you have conda installed.
conda env create -f environment.yml
The datasets have to be downloaded manually and then prepared using prepare_modelnet/shapenet_data.py from the tools folder.
If wanted you can set up training to use wandb to have an online performance model save.
Then models can be trained using train_mvcnn.py. For arguments used during cli-training please check train_mvcnn.py directly for the most up-to-date version of the CLI arguments.