training loss not converge #13

ilyaskhanbey · 2021-06-26T09:26:27Z

hello, i tried your training code with (adobe, distinction636 and realWorld) dataset. i keep only solid objects, and i removed all transparent object like glasses.

composition and laplacien loss seems to not converge after warmup step.(itr>5000)
9770/500000], REC: 0.0121, COMP: 0.2987, LAP: 0.1311, lr: 0.001000

I removed the test step while training, i think it's have not impact in the training. am i wrong?

I should wait for more iterations or i should fix some hyperparameters when training with more dataset than you used?

yucornetto · 2021-06-27T04:01:44Z

Hi, could you give more info about your problem? E.g., how do you put all datasets together?

The test step should have no impact on training. Besides, it seems to me your loss looks normal?

ilyaskhanbey · 2021-06-27T16:35:34Z

Hello i think i found what is the issue and why there is no ceonvergence of the loss.
I think it's because the composition loss when enabling real world augmentation (adding noise to input image)
Actually when you do training you do so something like that:

alpha_pred = mg_net(image_noise)
loss_lap = image_noise - (alpha_pred*fg +(1-alpha_pred)*bg)

I think it' should be

alpha_pred = mg_net(image_noise)
loss_lap = image_with_no_noise - (alpha_pred*fg +(1-alpha_pred)*bg)

in the first configuration the loss is more impacted by the noise due to the augmentation.

To test that i start training only on one image (we should converge in few iterations)
after the fix here is losses:

Image tensor shape: torch.Size([2, 3, 512, 512]). Trimap tensor shape: torch.Size([2, 3, 512, 512])
[66/500000], REC: 0.0203, COMP: 0.0745, LAP: 0.1510, lr: 0.000100

without thefix , the compo and lap loss will always grab between (0.15 to 0.7)

Image tensor shape: torch.Size([2, 3, 512, 512]). Trimap tensor shape: torch.Size([2, 3, 512, 512])
[69/500000], REC: 0.0241, COMP: 0.4077, LAP: 0.2623, lr: 0.000100

I think the composition loss when it's not converging it's impact the laplacien loss too.

yucornetto · 2021-06-27T16:54:02Z

Thanks for the explanation! Composition loss will be affected if real-world noises are introduced, as in https://github.com/yucornetto/MGMatting/blob/main/code-base/config/MGMatting-RWP-100k.toml you can see we disable the composition loss when real-world-aug is enabled.

As for lap loss, I am somehow confused since fg/image should not be involved in lap loss computation, as in https://github.com/yucornetto/MGMatting/blob/main/code-base/trainer.py#L224

I am not sure about your point that comp loss not converging also affect lap loss. But my suggestion will be just setting comp_loss_weight to 0 when real-world-aug is enabled.

ilyaskhanbey · 2021-06-27T17:15:04Z

Thank you for your answer, i didn't notice you are not using comp loss when enabling real-world-aug
I tested your pretrained rwp model and it's seems working much better for real data than the dim pretrained model.
I will try to retrain rwp model with enabling comp loss and the fix of comp loss i made, i will tell you if peformance are better.

Last question is it a good idea to have a batch size of 40 as you mentionned in your article? marco forte in his FBA article said batch size must be between 6 to 16 (in BN) for alpha prediction.

Thank you for your great work

yucornetto · 2021-06-27T17:26:34Z

Thanks for offering to run the experiments!

For your question regarding to FBA matting paper, I think they use batch size = 1 + WS + GN? Not sure about the 6 to 16 parts. My personal experience is that if BN is used, a relatively large batch size usually can lead to a better performance. But I did not run experiments to verify it on the matting task.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

training loss not converge #13

training loss not converge #13

ilyaskhanbey commented Jun 26, 2021

yucornetto commented Jun 27, 2021

ilyaskhanbey commented Jun 27, 2021

yucornetto commented Jun 27, 2021

ilyaskhanbey commented Jun 27, 2021

yucornetto commented Jun 27, 2021 •

edited

Loading

training loss not converge #13

training loss not converge #13

Comments

ilyaskhanbey commented Jun 26, 2021

yucornetto commented Jun 27, 2021

ilyaskhanbey commented Jun 27, 2021

yucornetto commented Jun 27, 2021

ilyaskhanbey commented Jun 27, 2021

yucornetto commented Jun 27, 2021 • edited Loading

yucornetto commented Jun 27, 2021 •

edited

Loading