About Training Loss #37

Ferry1231 · 2024-09-12T11:54:43Z

I am using my self-trained autoencoder as the encoder to train on the CIFAR-10 dataset. After 500 epochs, the loss dropped to around 0.1, but the reconstructed images are almost all white, with pixel values being quite high. I observed that the sample_tokens before decoding in the AE after sampling had very large values, with a mean reaching over 1000, while the original mean in the latent space was only about 2. I’m not sure why this is happening, and I would greatly appreciate your help in resolving this issue.

LTH14 · 2024-09-12T12:46:26Z

500 epochs on CIFAR-10 might not be enough. Since the ema is 0.9999, the model with ema needs around 100k iterations to generate a reasonable image. The model without ema still needs around 50k iterations. Also, please check 1. whether you have normalized the tokens according to your new autoencoder (the current normalization is 0.2325, which is specific to our ImageNet tokenizer) 2. you can use 1000 diffusion steps instead 100 to see whether the large value is because of the diffusion process

Ferry1231 · 2024-09-12T13:08:38Z

It means i need to train MAR on cifar10 at least for 100k epochs?? 😱(ema=0.9999)

LTH14 · 2024-09-12T13:12:49Z

No -- 100k iterations (160 epochs on ImageNet with bsz=2048). For CIFAR10 it should be around 2000 epochs.

Ferry1231 · 2024-09-12T13:14:22Z

ok thank you very much

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About Training Loss #37

About Training Loss #37

Ferry1231 commented Sep 12, 2024

LTH14 commented Sep 12, 2024

Ferry1231 commented Sep 12, 2024 •

edited

Loading

LTH14 commented Sep 12, 2024

Ferry1231 commented Sep 12, 2024

About Training Loss #37

About Training Loss #37

Comments

Ferry1231 commented Sep 12, 2024

LTH14 commented Sep 12, 2024

Ferry1231 commented Sep 12, 2024 • edited Loading

LTH14 commented Sep 12, 2024

Ferry1231 commented Sep 12, 2024

Ferry1231 commented Sep 12, 2024 •

edited

Loading