Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Challenges in Memorizing Single or Few Images #49

Open
kifarid opened this issue Sep 18, 2024 · 3 comments
Open

Challenges in Memorizing Single or Few Images #49

kifarid opened this issue Sep 18, 2024 · 3 comments

Comments

@kifarid
Copy link

kifarid commented Sep 18, 2024

Recently, I attempted to train the model on a different domain and decided to start with a simple experiment: testing whether the model could memorize a single image and then sample it correctly.

Here’s what I did:

  • Trained for 20K epochs.
  • Increased diffusion_batch_mul to 256 to facilitate assigning the data points to the appropriate output more easily.

Despite these adjustments, the model didn’t work as expected. Initially, I thought the issue might be related to the new domain, suspecting that the VAE might not be tokenizing the data properly. However, I encountered the same issue when trying to memorize two images in the ImageNet dataset as well.

Do you have any intuition or thoughts on why this might be happening?

@LTH14
Copy link
Owner

LTH14 commented Sep 18, 2024

I never did an experiment to memorize data. A few things to check before any conclusion:

  1. Do you still use ema? Since we use an ema=0.9999, after 20k steps the ema parameters will still contain ~14% of the original random initialized weights. I would suggest disabling ema and also the learning rate warmup for your experiment.
  2. What's the batch size and learning rate? Since the model parameters are all tuned with a large batch size, I would suggest replicating your data to facilitate a batch size of at least 256 or 512.
  3. How does the loss look like? Ideally, the diffusion loss would be near 0 if the model can memorize the data.

@kifarid
Copy link
Author

kifarid commented Sep 26, 2024

Hi Tianhong,

Thank you for your prompt response. I am currently studying the generalization properties of image/video generative methods and was particularly curious about how these models scale with the size of effective training data.

To investigate this on MAR, I ran one of the experiments to explore whether methods like MAR, MASKGIT, or others can successfully memorize and regenerate a single image. For this purpose, I replicated the data to match a batch size of 512, increased the diffusion batch multiplier, and turned off the EMA. As a result, I achieved a near-zero loss (~0.009) it takes a long time for the model to start decreasing from the max FID (~600).

Do you have any insights or elaboration on why this behaviour might occur?

I plan to run more experiments to understand such models more and their generalization capabilities and will keep you updated with any further findings.

@LTH14
Copy link
Owner

LTH14 commented Sep 26, 2024

If you only have 1 data, then the FID will definitely be very high. Have you visualized your generated image and see whether it matches the single image you provided?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants