Challenges in Memorizing Single or Few Images #49

kifarid · 2024-09-18T07:46:07Z

Recently, I attempted to train the model on a different domain and decided to start with a simple experiment: testing whether the model could memorize a single image and then sample it correctly.

Here’s what I did:

Trained for 20K epochs.
Increased diffusion_batch_mul to 256 to facilitate assigning the data points to the appropriate output more easily.

Despite these adjustments, the model didn’t work as expected. Initially, I thought the issue might be related to the new domain, suspecting that the VAE might not be tokenizing the data properly. However, I encountered the same issue when trying to memorize two images in the ImageNet dataset as well.

Do you have any intuition or thoughts on why this might be happening?

The text was updated successfully, but these errors were encountered:

LTH14 · 2024-09-18T10:29:44Z

I never did an experiment to memorize data. A few things to check before any conclusion:

Do you still use ema? Since we use an ema=0.9999, after 20k steps the ema parameters will still contain ~14% of the original random initialized weights. I would suggest disabling ema and also the learning rate warmup for your experiment.
What's the batch size and learning rate? Since the model parameters are all tuned with a large batch size, I would suggest replicating your data to facilitate a batch size of at least 256 or 512.
How does the loss look like? Ideally, the diffusion loss would be near 0 if the model can memorize the data.

kifarid · 2024-09-26T07:47:06Z

Hi Tianhong,

Thank you for your prompt response. I am currently studying the generalization properties of image/video generative methods and was particularly curious about how these models scale with the size of effective training data.

To investigate this on MAR, I ran one of the experiments to explore whether methods like MAR, MASKGIT, or others can successfully memorize and regenerate a single image. For this purpose, I replicated the data to match a batch size of 512, increased the diffusion batch multiplier, and turned off the EMA. As a result, I achieved a near-zero loss (~0.009) it takes a long time for the model to start decreasing from the max FID (~600).

Do you have any insights or elaboration on why this behaviour might occur?

I plan to run more experiments to understand such models more and their generalization capabilities and will keep you updated with any further findings.

LTH14 · 2024-09-26T14:46:23Z

If you only have 1 data, then the FID will definitely be very high. Have you visualized your generated image and see whether it matches the single image you provided?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Challenges in Memorizing Single or Few Images #49

Challenges in Memorizing Single or Few Images #49

kifarid commented Sep 18, 2024

LTH14 commented Sep 18, 2024

kifarid commented Sep 26, 2024

LTH14 commented Sep 26, 2024

Challenges in Memorizing Single or Few Images #49

Challenges in Memorizing Single or Few Images #49

Comments

kifarid commented Sep 18, 2024

LTH14 commented Sep 18, 2024

kifarid commented Sep 26, 2024

LTH14 commented Sep 26, 2024